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Preface 

This manual contains the solutions to all of the problems in the second edition of my MIT 
Press book, Econometric Analysis of Cross Section and Panel Data. In addition to the 
problems printed in the text, I have included some “bonus problems” along with their 
solutions. Several of these problems I left out due to space constraints and others occured to 
me since the book was published. I have a collection of other problems, with solutions, that I 
have used over the past 10 years for problem sets, takehome exams, and in class exams. I am 
happy to provide these to instructors who have adopted the book for a course. 

I solved the empirical examples using various versions of Stata, ranging from 8.0 through 
11.0. I have included the Stata commands and output directly in the text. No doubt there are 
Stata users and users of other software packages who will, at least in some cases, see more 
efficient or more elegant ways to compute estimates and test statistics. 

Some of the solutions are fairly long. In addition to filling in all or most of the algebraic 
steps, I have tried to offer commentary about why a particular problem is interesting, why I 
solved the problem the way I did, or which conclusions would change if we varied some of the 
assumptions. Several of the problems offer what appear to be novel solutions to situations that 
can arise in actual empirical work. 

My progress in finishing this manual was slowed by a health problem in spring and 
summer of 2010. Fortunately, several graduate students came to my aid by either working 
through some problems or organizing the overall effort. I would like to thank Do Won Kwak, 
Cuicui Lu, Myoung-Jin Keay, Shenwu Sheng, Iraj Rahmani, and Monthien Satimanon for their 
able assistance. 


I would appreciate learning about any mistakes in the solutions and also receiving 


suggestions for how to make the answers more transparent. Of course I will gladly entertain 
suggestions for how the text can be improved, too. I can be reached via email at 


wooldril @msu.edu. 


Solutions to Chapter 2 Problems 


2.1. a. Simple partial differentiation gives 


OE 
a = Bi + Baxe 


and 


Elx, x2) 
Ox? 


= P2 +2B3x2 + Bart 

b. By definition, E(u|x1x2) = 0. Because x5 and x1x2 are functions of (x1,x2), it does not 
matter whether or not we also condition on them: E(u|x1,x2,x5,x1x2) = 0. 

c. All we can say about Var(u|x1,x2) is that it is nonnegative for all xı and x2: 

E(u|x1,x2) = 0 inno way restricts Var(u|x1,x2). 

2.2. a. Because 0E(y|x)/Ox = 61 + 262(x — p), the marginal effect of x on E(y|x) is a linear 
function of x. If 62 is negative then the marginal effect is less than 6; when x is above its mean. 
If, for example, 6; > 0 and 62 < 0, the marginal effect will eventually be negative for x far 
enough above u. (Whether the values for x such that COE(y|x)/Ox < 0 represents an interesting 
segment of the population is a different matter.) 

b. Because OE(y|x)/Ox is a function of x, we take the expectation of OE(y|x)/Ox over the 
distribution of x: E[OE(y|x)/ox] = E[61 + 262(¢ — u)] = 61 + 262E[(x - u)] = 61. 

c. One way to do this part is to apply Property LP.5 from Appendix 2A. We have 

LOIL) = LEQW] = 50+ SL (Ce lL x] + Sab [G - 1971.2] 
= ĝo + Ô1(x — u) + ô2(y0 + 71%), 
because L[(x — u)|1,x] = x — u and yo + yıx is the linear projection of (x — u)? on x. By 


assumption, (x — u)? and x are uncorrelated, and so yı = 0. It follows that 


Lix) = (60 — 1u + 6270) + 61x 


2.3.a.y = Bo + Bix1 + Box2 + B3x1x2 + u, where u has a zero mean given xı and x2: 
E(u|x1,x2) = 0. We can say nothing further about u. 

b. OE(Qy|x1,x2)/0x1 = Bi + B3x2. Because E(x2) = 0, Bi = ELOEQx1,x2)/0x1], that is, By is 
the average partial effect of xı on E(y|x1,x2)/0x1. Similarly, B2 =E[OE(y|x1, x2)/0x2]. 

c. If x; and x2 are independent with zero mean then E(x1x2) = E(x1)E(x2) = 0. Further, 
the covariance between xx and x; is E(x1x2 + x1) = E(x?x2) = E(xf)E(x2) (by 
independence) = 0. A similar argument shows that the covariance between x1x2 and x2 is zero. 
But then the linear projection of x1x2 onto (1,x1,x2) is identically zero. Now just use the law 


of iterated projections (Property LP.5 in Appendix 2A): 


LOQ|1,x1,x%2) = L(Bo + Bix1 + Box2 + B3x1%2/1,x1,x2) 
= Po + Bix1 + Box2 + B3L(x1x2|1, x1, X2) 
= Po + Bix1 + Box. 


d. Equation (2.47) is more useful because it allows us to compute the partial effects of x; 
and x2 at any values of x; and x2. Under the assumptions we have made, the linear projection 
in (2.48) does have as its slope coefficients on x; and x2 the partial effects at the population 
average values of x; and x2 — zero in both cases — but it does not allow us to obtain the partial 
effects at any other values of x; and x2. Incidentally, the main conclusions of this problem go 
through if we allow x; and x2 to have nonzero population means. 


2.4. By assumption, 
E(u|x, v) = 69 + xð + piv 


for some scalars ôo, pı and a column vector 8. Now, it suffices to show that 69 = 0 and 6 = 0. 


One way to do this is to use LP.7 in Appendix 2A, and in particular, equation (2.56). This says 


that (60,8')' can be obtained by first projecting (1,x) onto v, and obtaining the population 
residual, r. Then, project u onto r. Now, since v has zero mean and is uncorrelated with x, the 
first step projection does nothing: r = (1,x). Thus, projecting u onto r is just projecting u onto 
(1, x). Since u has zero mean and is uncorrelated with x, this projection is identically zero, 
which means that ôo = 0 and ô = 0. 

2.5. By definition and the zero conditional mean assumptions, Var(w1|x,z) = Var(y|x, z) 
and Var(w2|x) = Var(y|x). By assumption, these are constant and necessarily equal to 
of = Var(u 1) and o5 = Var(u2), respectively. But then Property CV.4 implies that o3 > of. 
This simple conclusion means that, when error variances are constant, the error variance falls 
as more explanatory variables are conditioned on. 

2.6. a. By linearity of the linear projection, 

L(ql1,x) = L(q*|1,x) + L(eļ1,x) = L(q*|1,x), 

where the last inequality follows because L(e|1,x) = 0 when E(e) = 0 and E(x’e) = 0. 
Therefore, the parameters in the linear projection of g onto (1,x) are the same as the linear 
projection of g* onto (1,x). This fact is useful for studying equations with measurement error 
in the explained or explanatory variables. 

b. r = q — L(q|1,x) = (q* +e) — L(q|1,x) = (q* + e) — L(g*|1, x) (from part a) 
= [q* — L(q*|1,x) +e = r* +e. 


2.7. Write the equation in error form as 


y = g(x) + zB +u 
E(u|x,z) = 0. 


Take the expected value of the first equation conditional only on x: 


E(x) = g(x) + [E(z|x) |B 
and subtract this from the first equation to get 
y —EOQ\x) = [z - E(@|x)]B + u 
or 
y= “Btu 
Because Z is a function of (x, z), E(u|z) = 0 [since E(u|x,z) = 0], and so E(j|z) = Zp. 

This basic result is fundamental in the literature on estimating partial linear models. First, 
one estimates E(y|x) and E(z|x) using very flexible methods (typically, nonparametric 
methods). Then, after obtaining residuals of the form Ñ; = y; — E(y;|x;) and Žž; = z; — E(z\|x;), B 
is estimated from an OLS regression j; on Z;,i = 1,...,N. Under general conditions, this kind 
of nonparametric partialling-out procedure leads to a yN -consistent, asymptotically normal 
estimator of B. See Robinson (1988) and Powell (1994). 

In the case where E(y|x) and the elements of E(z|x) are approximated as linear functions of 
a common set of functions, say {h1(x),...,40(x)}, the partialling out is equivalent to 
estimating a linear model 

y = Qo + ah, (x) +...+aQho(x) + xB + error 
by OLS. 

2.8. a. By exponentiation we can write y = exp[g(x) +u] = exp[g(x)] exp(u). It follows 
that 

E()|x) = exp[g(x)]Elexp(w)|x] = exp[g(x) ]a(x) 


Using the product rule gives 


cats) 


oN E g exple(x)]a(x) + exp[g(x)] 


: g E(uIx) + EG) 


oo 1 
a(x) 


Therefore, 
Oa(x) Xj 
Ox; a(x) 


EOD |x) _ age) 
Ox; EG) Ox; i 


We can establish this relationship more simply by assuming E(y|x) > 0 for all x and using 
equation (2.10). 
b. Write z; = log(x;) so x; = exp(z;). Then, using the chain rule, 


Og(x) | Ox; _ Og(x) Og(x) 
7 Oe -exp(z;) = ae Xj 


Og(x) _ Og(x) _ 
Olog(x;) 


Oz; Ox; Oz; 
c. From log(y) = g(x) + u and E(u|x) = 0 we have E[log(y)|x)] = g(x). Therefore, using 


(2.11), the elasticity would be simply 


gx) _ Og(x) . 
Olog(x;) Ox; 


which, compared with the definition based on E(y|x), omits the elasticity of a(x) with respect 


to xj. 
2.9. This is easily shown by using iterated expectations: 
E(x'y) = E[E(x'y|x)] = E[x'EQ|x)] = E[x'u(x)] 


Therefore, 


ò = [E(x'x)] EEY) = [E(x'x)] EEx n(x) 


and the latter equation is the vector of parameters in the linear projection of u(x) on x 


2.10. a. As given in the hint, we can always write 


E(Q|x, 5) = (1 =s) + Hox) +s + wi (x) 


Now condition only on s and use iterated expectations: 


EQ|s) = E[EQx,s)\s] = E[(1 -= s) + uo(x) + s + Wi (x)|s] 
= (1—s)E[Ho(x)|s] + sE[q1 (x)Is] 


Therefore, 


EQ|s = 1) = Elpi(x)|s = 1] 
EQ|s = 0) = E[wo(x)|s = 0] 


and so, by adding and subtracting Eļuo(x)|s = 1], we get 


EQ|s = 1) — EQ|s = 0) = Elui(x)|s = 1] — E[mo(x)|s = 0] 
= {E[u1(x)|s = 1] - E[wo(x)\s = 1} + {Elwo(x)|s = 1] - E[uo(x)|s = OF} 


b. Use part a and linearity of the conditional means: 


E(Q|s = 1) - EQļs = 0) = [E(ls = DB, — Els = 1)B,] + [Els = 1)B, — Els = 0)B,] 
= E(a|s = 1) - (B; — Bo) + [EGls = 1) - Els = 0)] + By 


This decomposition attributes the difference in the unconditional means, 

E(Qy|s = 1) — EQ|s = 0), to two pieces. The first part is due to differences in the regression 
parameters, B, — B, — where we evaluate the difference at the average of the covariates from 
the s = 1 subpopulation. The second part is due to a difference in means of the covariates from 
the two subpopulations — where we apply the regression coefficients from the s = 0 
subpopulation. If, for example, the two regression functions are the same — that is, B, = By — 
then any difference in the subpopulation means E(y|s = 0) and E(y|s = 1) is due toa 
difference in averages of the covariates across the subpopulations. If the covariate means are 


the same — that is, E(x|s = 1) = E(x|s = 0) — then E(y|s = 0) and E(y|s = 0) can still differ if 


B, + Bo. In many applications, both pieces in E(y|s = 1) — EQ|s = 0) are present. 
Incidentally, the approach in this problem is not the only interesting way to decompose 
EQy|s = 1) — EQ|s = 0). See, for example, T.E. Elder, J.H. Goddeeris, and S.J. Haider, 


“Unexplained Gaps and Oaxaca—Blinder Decompositions,” Labour Economics, 2010. 


10 


Solutions to Chapter 3 Problems 

3.1. To prove Lemma 3.1, we must show that for all € > 0, there exists b < œ% and an 
integer NV, such that P[|x\|> be] < e, all N > Ne. We use the following fact: since xy Z, a, for 
any € > 0 there exists an integer N; such that P[|xy — a|> 1] < e forall N > N+. [The existence 
of N; is implied by Definition 3.3(1).] But |xy|= |xy — a+ als |xn — alta (by the triangle 
inequality), and so |xy|-a|< |xy — al. It follows that P[|xy|-la|> 1] < P[|xy —a|> 1]. Therefore, 
in Definition 3.3(3) we can take b; = |a|+1 (irrespective of the value of £) and then the 
existence of N; follows from Definition 3.3(1). 

3.2. Each element of the K x 1 vector Zyxy is the sum of J terms of the form Zyjixyy. 
Because Znj; = 0p(1) and xy; = O,(1), each term in the sum is 0,(1) from Lemma 3.2(4). By 
Lemma 3.2(1), the sum of op(1) terms is 0,(1). 

3.3. This follows immediately from Lemma 3.1 because g(x) 4 g(c). 

3.4. Both parts follow from the continuous mapping theorem and basic properties of the 
normal distribution. 

a. The function defined by g(z) = A’z is clearly continuous. Further, if z ~ Normal(0, V) 
then A’z ~ Normal(0, A’VA). By the continuous mapping theorem, 

A'zy 5 A'z ~ Normal(0, A'VA). 

b. Because V is nonsingular, the function g(z) = z'V~'z is continuous. But if 
z ~ Normal(0,V), z'V'z ~ v%. So zyV ‘zy SV" ~ pare 

3.5. a. Because Var(~y) = o°/N, Var[ JN On — )] = N(o?/N) = o°. 

b. By the CLT, J/N (vy — u) ~ Normal(0,o7), and so Avar[ /N (fy — y)] = 0°. 


c. We obtain Avar(jy) by dividing Avar[ /N (Fy — u)] by N. Therefore, Avar(~y) = o2/N. 
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As expected, this coincides with the actual variance of yy. 

d. The asymptotic standard deviation of Yy is the square root of its asymptotic variance, or 
o/ JN. 

e. To obtain the asymptotic standard error of Yy, we need a consistent estimator of o. 
Typically, the unbiased estimator of ø? is used: G2 = (N-1)71 Eio — jy)’, and then ô is 
the positive square root. The asymptotic standard error of Yy is simply ô/ 4N . 

3.6. From Definition 3.4, we need to show that for any 0 < c < 1/2, N°(6y — 0) = 0, (1). 
But 

N¢e(6y — 0) = NUD JN (6y — 0) = ND . O,(1). 
Because c < 1/2, N0] = 0(1), and so N° (În — 8) = o(1) + O,(1) = 0, (1). 

3.7. a. For 0 > 0 the natural logarithm is a continuous function, and so 
plim{log(@)] = log[plim()] = log(@) = y. 

b. We use the delta method to find Avar[ /N (7 — y)]. In the scalar case, if 7 = g(6) then 
Avar[ /N ($ — y)] = [de(0)/d0]2Avar[ JN (6 — 0)]. When g(0) = log (0) — which is, of course, 
continuously differentiable — Avar[ /N (7 — y)] = (1/0)? Avar[ /N (6 — 6)]. 

c. In the scalar case, the asymptotic standard error of y is generally \dg(6)/d6|-se(@). 
Therefore, for g(0) = log (0), se(7) = se(6)/(0). When Ô = 4 and 
se(6) = 2,7 = log(4) ~ 1.39 andse(7) = 1/2. 

d. The asymptotic ¢ statistic for testing Ho : 0 = 1 is (6 — 1)/se(6) = 3/2 = 1.5. 

e. Because y = log(@), the null of interest can also be stated as Ho : y = 0. The f statistic 
based on ¥ is about 1.39/(.5) = 2.78. This leads to a very strong rejection of Ho, whereas the t 


statistic based on Ô is, at best, marginally significant. The lesson is that, using the Wald test, 
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we can change the outcome of hypotheses tests by using nonlinear transformations. 
3.8 a. This follows by Slutsky’s Theorem since the function g(01,02) = 01/02 is continuous 
at all points in R? where 02 + 0: plim(6;/62) = [plim(4,)/plim(62)] = 01/02. 
b. To find Avar(y) we need to find Veg(8), where 2(01,02) = 01/02. But 
Veg(®) = (1/02,-01/03), and so Avar(7) = (1/02 — 0:/03)[Avar(6)](1/02 — 01/02). 
c. If 6 = (-1.5,.5)’ then Veg(6) = (2,6). Therefore, 
Pe —™~_LA x 
Avar(y) = (2,6)[Avar(8)](2,6)' = 66.4. Taking the square root gives se(7) ~ 8.15. 


3.9. By the delta method, 
Avar[/N @ - y)] = G®)V,G)’, Avar[/N @ -y)] = G(0)V,G)’, 
where G(0) = Vo g(0) is O x P. Therefore, 
Avar[ VN Ĝĝ - y)] - Avar[ VN @ - y)] = G@)(V, - V1)G(0)’. 
By assumption, V2 — V; is positive semi-definite, and therefore G(8)(V, — Vi)G(6)' is p.s.d. 


This complete the proof. 


3.10. By assumption, oc? = E(w?) = Var(w;) < œ. Because of the i.i.d. assumption, 
Var(xy) = (N12) No? = 0°. 


Now, Chebyshev’s inequality gives that for any b, > 0, 


Therefore, in the definition of O,(1), for any € > 0 choose b, = o/ Je and N, = 1 and then 
Plx] be] < e for all N > Ng. 


3.11. a. Letxy = Nt Er (w — ui) so that 
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N N 
Var(xv) = N? > Var(w;) = N? > o? 
i=1 


i=1 j= 


By Chebyshev’s inequality, for any € > 0, 
Varay MELo? 
P[lxa> £] < ae cae = = 


It follows that P[|xy|> £] > 0 as N > œ if N”? >: o? > Qas N > oœ. 
b. Ifo? < b < œ for all i — that is, the sequence of variances is bounded — then 
N 
N? $0? < bIN > Oas N > o. 
i=1 


Thus, uniformly bounded variances is sufficient for i.n.i.d. sequences to satisfy the WLLN. 
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Solutions to Chapter 4 Problems 


4.1. a. Exponentiating equation (4.49) gives 


wage = exp(Bo + Bimarried + Breduc + zy + u) 


= exp(u) exp(Bo + Bimarried + Pzeduc + zy). 
Therefore, 
E(wage|x) = E[exp(w)|x] exp(Po + Bimarried + Breduc + zy), 

where x denotes all explanatory variables. Now, if u and x are independent 
thenE[exp(u)|x] =E[Lexp(u)] = ôo, say. Therefore 

E(wage|x) = ôo exp(Bo + Bimarried + Breduc + zy). 
If we set married = 1 and married = 0 in this expecation (keeping all else equal) and find the 
proportionate increase we get 


doexp(Bo + Bi + Breduc + zy) — ôo exp(Bo + Breduc + zy) 
ôo exp(Bo + Breduc + zy) 


= exp(f1) - 1. 
Thus, the percentage difference is 100 + [exp(B1) — 1]. 

b. Since 0; = 100 - [exp(61) — 1] = g(B1), we need the derivative of g with respect to B1: 
dg/dB, = 100 - exp(f1). The asymptotic standard error of 4, using the delta method is 


obtained as the absolute value of dg¢/dB, times se(B 1): 


se(ĝ1) = 100 + [exp(B1)] + se(ĝ1). 
c. We can evaluate the conditional expectation in part a at two levels of education, say 
educo and educ, all else fixed. The proportionate change in expected wage from educo to 


educ is 


[exp(B2educ1) — exp(B2educo) |/exp(B2educo) = exp|B2(educ; — educo)| — 1 = exp(B2Aeduc) — 1. 
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Using the same arguments in part b, 6> = 100- [exp(B2Aeduc) — 1] and 
se(6>) = 100- |Aeduc\exp(B2Aeduc)se(B2). 

d. For the estimated version of equation (4.29), B 1 =. 199, se(ĝ 1) =. 039, B 2 =.065, and 
se(B2) =.006. Therefore, 0; = 22.01 and se(61) = 4.76. For 62 we set Aeduc = 4. Then 
6 = 29.7 and se(82) = 3.11. 

4.2. a. For each i we have, by OLS.2, E(u;|X) = 0. By independence across i and Property 
CE.5, E(u;|X) = E(u;|x;) because (u;,x;) is independent of the explanatory variables for all 
other observations. Letting U be the N x 1 vector of all errors, this implies E(U|X) = 0. But 
® = B + (X'X)'X’U and so 

E(B|X) = B + (X'X)1X'E(U|X) = B+ (X'X)'X' -0 = B. 
b. From the expression for B in part a we have 

Var(B[X) = Var[(X’X)1X'U|X] = (X'X)!X'Var(U|X)X(X'X) 71. 
Now, because E(U|X) = 0, Var(U|X) = E(UU'|X). For the diagonal terms, 
E(u?|X) = E(u?|x;) = Var(u;|x;) = 02, where the least equality is the homoskedasticity 
assumption. For the covariance terms, we must show that E(u;u,|X) = 0 for all 
i + h,i,h = 1,...,N. Again using Property CE.5, E(ujun|X) = E(uiuy|xi,x,,) and 
E(u;|x;,un,X2) = E(u;|x;) = 0. But then E (u;un|Xi, Uh, Xh) = E(ui|Xi,un,Xn)un = 0. It follows 
immediately by iterated expectations that conditioning on the smaller set also yields a zero 
conditional mean: E(w ;u,|x;,x;,) = 0. This completes the proof. 

4.3. a. Not in general. The conditional variance can always be written as 
Var(u|x) = E(u2|x) — [E(ulx)]?; if E(u|x) + 0, then E(w2|x) + Var(u|x). 


b. It could be that E(x'w) = 0, in which case OLS is consistent, and Var(u|x) is constant. 
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But, generally, the usual standard errors would not be valid unless E(u|x) = 0 because it is 
E(u?|x) that should be constant. 
4.4. For each i, ûi = yi — xip = uj xÊ — B), and so 


i? = u? — 2u;x;(B — B) + [x:(B — B)]2. Therefore, we can write 


N N N N 
N€ X aixix; = NI DO uixix; — 2N7 X uah - B) xix: +N OE: - B)Pxix:. 
i=1 i=1 i=1 i=1 


Dropping the "—2", the second term can be written as the sum of K terms of the form 


N N 
nN 2 lux; - B)lxix: = (Ê; - B)N7 2 Cuaxy)aix; = op(1) + O,(1), 


where we have used ĝ; — B; = op(1) and N=! ye (wary) xix; = O,(1) whenever 
E||uixyxinxir|] < æ for allj, h, and k (as would just be assumed). Similarly, the third term can 
be written as the sum of K? terms of the form 

N 

(Bj — B))(Ba — BN $ oxn): = Op(1) © Op(1) + Op(1) = op (1), 

i=1 
where we have used NV! ee xnxx; = O,(1) whenever E[|x x nx ikXim|] < œ for allj, h, 
k, and m. We have shown that NV! Ee aizx)x; = N1! a u?x'X; + 0p(1), which is what we 
wanted to show. 

4.5. Write equation (4.50) as E(y|w) = wô, where w = (x,z). Since Var(y|w) = 0%, it 

follows by Theorem 4.2 that Avar /N ($ — 8) is o2[E(w'w)]~', where 6 = B,D . Importantly, 
because E(x'z) = 0, E(w'w) is block diagonal, with upper block E(x'x) and lower block E(z?). 


Inverting E(w'w) and focusing on the upper K x K block gives 


Avar JN Ĝ - B) = o2[E(x'x)] 7. 
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Next, we need to find Avar,/N (Ñ — B). It is helpful to write y = xB + v where v = yz + u 
and u = y — E(j|x,z). Because E(x'z) = 0 and E(x'u) = 0, E(x'v) = 0. Further, 
E(v?|x) = y?E(z?|x) +E(u?|x) +2yE(zu|x) = y?E(z?|x) + 02, where we use 
E(zu|x,z) = zE(u|x,z) = 0 and E(u?|x,z) = Var(y|x,z) = o?. Unless E(z?|x) is constant, the 
equation y = xf + v generally violates the homoskedasticity assumption OLS.3. So, without 


further assumptions, 


Avar/N (B — B) = [E(x'x)] 1E(v2x'x)[E(x'x)] 7. 
Now we can show Avar /N (p — B) — Avar /N (Ê — P) is positive semi-definite by writing 
Avar/N (B — B) — Avar/N (B - B) = [E(x'x)] “E(v2x'x)[E(x'x)] 7? - o? [Exx]! 


= [E(x'x)]'E(v2x'x)[E(x'x)] | - o?[E(x'x)] TER DE'L]! 
= [E(x’x)] TEMY) — o2E(x'x)][E(x'x)] 7 


Because [E(x'x)] is positive definite, it suffices to show that E(v2x'x) — o2[E(x’x) is p.s.d. 
To this end, let h(x) = E(z?|x). Then by the law of iterated expectations, 

E(v2x'x) =E[E(v2|x)x'x] = y2E[A(x)x’x] + o2E(x’x). Therefore, 

E(v2x'x) — o2E(x'x) = y2E[h(x)x x], which, when y + 0, is actually a positive definite matrix 
except by fluke. In particular, if E(z?|x) = E(z*) = n? > 0 (in which case y = xf + v satisfies 
the homoskedasticity assumption OLS.3), E(v?x'x) — o2E(x'x) = y?n?E(x’x), which is 
positive definite. 

4.6. Because nonwhite is determined at birth, we do not have to worry about nonwhite 
being determined simultaneously with any kind of response variable. Measurement error is 
certainly a possibility, as a binary indicator for being Caucasian is a very crude way to measure 
race. Still, many studies hope to isolate systematic differences between those classified as 


white versus other races, in which case a binary indicator might be a good proxy. Of course, it 


18 


is always possible that people are misclassified in survey data. But an important point is that 
measurement error in nonwhite would not follow the classical errors-in-variables assumption. 
For example, if the issue is simply recording the incorrect entry, then the true indicator, 
nonwhite*, is also binary. Then, there are four possible outcomes: nonwhite* = 1 and 
nonwhite = 1; nonwhite* = 0 and nonwhite = 1; nonwhite* = 1 and 

nonwhite = 0;nonwhite* = 0 and nonwhite = 0. In the first and last cases, no error is made. 
Generally, it makes no sense to write nonwhite = nonwhite* + e, where e is a mean—zero 
measurement error that is independent of nonwhite’. 

Probably in applications that seek to estimate a race effect, we would be most concerned 
about omitted variables. While race is determined at birth, it is not independent of other factors 
that generally affect economic and social outcomes. For example, we would want to include 
family income and wealth in an equation to test for discrimination in loan applications. If we 
cannot, and race is correlated with income and wealth, then an attempt to test for 
discrimination can fail. Many other applications could suffer from endogeneity caused by 
omitted variables. In looking at crime rates by race, we also need to control for family 
background characteristics. 

4.7. a. One important omitted factor in u is family income: students that come from 
wealthier families tend to do better in school, other things equal. Family income and PC 
ownership are positively correlated because the probability of owning a PC increases with 
family income. Another factor in u is quality of high school. This may also be correlated with 
PC: astudent who had more exposure with computers in high school may be more likely to 
own a computer. 


b. B 3 is likely to have an upward bias because of the positive correlation between u and PC, 
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but it is not clear cut because of the other explanatory variables in the equation. If we write the 


linear projection 
u = 00 + O1hsGPA + 02SAT + 03PC + r 


then the bias is upward if 63 is greater than zero. This measures the partial correlation between 
u (say, family income) and PC, and it is likely to be positive. 

c. If data on family income can be collected then it can be included in the equation. If 
family income is not available sometimes level of parents’ education is. Another possibility is 
to use average house value in each student’s home zip code, as zip code is often part of school 
records. Proxies for high school quality might be faculty—student ratios, expenditure per 
student, average teacher salary, and so on. 

4.8. a. OE(y|x1,.x2)/0x1 = Bi + B3x2. Taking the expected value of this equation with 
respect to the distribution of x2 gives a; = Bi + ß3u2. Similarly, 

OE(\x1,X2)/Ox2 = B2 + B3x1 + 2B 4x2, and its expected value is a2 = Bo + B31 + 2Bapo. 


b. One way to write E(y|x1,x2) is 


Elx x2) = o + @1x1 + G2x2 + B3(x1 — 1) (%2 — u2) + Ba(x2 - pr)’, 
where ôo = Bo + B3u1u — Bad (as can be verified by matching the intercepts in the two 
equations). 

c. Regress y; on 1,x4, xn, (Xi — f1)(X22 — u2), (V2 — u2)?,i = 1,2,...,N. If we do not 
know u; and u2, we can estimate these using the sample averages, X 1 and Xp. 


d. The following Stata session can be used to answer this part: 


sum educ exper 


Variable | Obs Mean Std. Dev Min Max 

Sen a a ae cae! pe, a ee +-------------------------------------------------------- 
educ | 935 13.46845 2.196654 9 18 

exper | 935 11.56364 4.374586 1 23 
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gen educOexper® = 


gen experOsq = (exper - 11.56)42 


reg lwage educ exper 


Source | 


educOexper® experOsq 


Model | 


22.7093743 
142 .946909 


165.656283 


(educ - 13.47)*(exper - 11.56) 


Number of obs = 


ll 
ow 
O 
© 
mS 


| 

exper | 
educOexperO | 
experOsq | 
_cons | 


.0837981 
0223954 
.0045485 
.0009943 
5.392285 


df MS 
4 5.67734357 
930 .153706354 
934 .177362188 
Std. Err t 
. 0069787 12.01 
. 0034481 6.49 
. 0017652 2.58 
. 000653 1.52 
. 1207342 44.66 


F( 4, 930) 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 
[95% Conf. 
.0701022 
.0156284 
.0010843 
- .0002872 
5.155342 


.097494 
.0291624 
.0080127 
.0022758 
5.629228 


gen educexper = 


gen expersq = exper^2 


reg lwage educ exper 


Source | 


educ*exper 


educexper expersq 


Model | 


22.7093743 
142 .946909 


165.656283 


Number of obs = 


= 0.0000 


= 0.1334 


| 

exper | 
educexper | 
expersq | 
_cons | 


.0312176 
- .0618608 
.0045485 
. 0009943 
6.233415 


df MS 
4 5.67734357 
930 .153706354 
934 .177362188 
Std. Err t 
, 0193142 1.62 
. 0331851 -1.86 
.0017652 2.58 
.000653 1.52 
. 3044512 20.47 


F( 4, 930) 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 

[95% Conf. 

- .0066869 

- .1269872 

. 0010843 

- .0002872 

5.635924 


.0691221 
. 0032656 
.0080127 
.0022758 
6.830906 


In the equation where educ and exper are both demeaned before creating the interaction 


and the squared terms, the coefficients on educ and exper seem reasonable. For example, the 


coefficient on educ means that, at the average level of experience, the return to another year of 


education is about 8.4%. As experience increases above its average value, the return to 


21 


education also increases (by .45 percentage points for each year of experience above 11.56). 
In the model containing educ + exper and exper?, the coefficient on educ is the return to 
education when exper = 0 — not an especially interesting segment of the population, and 
certainly not representative of the men in the sample.(Notice that the standard error of B educ iN 
the second regression is almost three times the standard error in the first regression. This 
difference illustrates that we can estimate the marginal effect at the average values of the 
covariates much more precisely than at extreme values of the covariates.) The coefficient on 
exper in the first regression is the return to another year of experience at the average values of 
both educ and exper. So, for a man with about 13.5 years of education and 11.6 years of 
experience, another year of experience is estimated to be worth about 2.2%. In the second 
regression, where educ and exper are not first demeaned, the coefficient on exper is the return 
to the first year of experience for a man with no schooling. This is not an interesting part of the 
U.S. population, and, in a sample where the lowest completed grade is ninth, we have no hope 
of estimating such an effect, anyway. The negative, large coefficient on exper in the second 
regression is puzzling only when we forget what it actually estimates. Note that the standard 
error on B exper in the second regression is about 10 times as large as the standard error in the 
first regression. 


4.9. a. Just subtract log(y_1) from both sides and define Alog(y) = log(y) — log(v-1): 


Alog) = Bo + xB + (a1 — 1) log(y-1) + u. 
Clearly, the intercept and slope estimates on x will be the same. The coefficient on log(y-1) 
becomes a, — 1. 
b. For simplicity, let w = log(y) and w_; = log(v_1). Then the population slope coefficient 


in a simple regression is always a; = Cov(w-1, w)/Var(w_1). By assumption, 
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Var(w) = Var(w-1), which means we can write a1 = Cov(w-1,w)/(ow_,ow), where 
Ow, = Sd(w_1) and oy = sd(w). But Corr(w_1,w) = Cov(w-1,w)/(Ow_,Ow), and since a 
correlation coefficient is always between —1 and 1, the result follows. 


4.10. Write the linear projection of x% onto the other explanatory variables as 


x% = ĝo + 01X1 + 2X2 +... +0 K-1XK-1 + rx. Now, because xx = xx + ex, 


L(xxl1,x1, ahha) = L(xžļ1, x1, ...,XK1) + L(ex|1,x1, <.. RA) 


= L(xR 1, x1, se »XK-1) 
because ex has zero mean and is uncorrelated with x1,...,xx-1 [and so 
L(ex|1,x1,...,Xx-1) = 0]. But the linear projection error rx is 
VK =Xk- L(xx{|1,%1, pA) = [xz = L(xžl1, x1, .. 5 XK-1)] +é€xk = r% + eg. 


Now we can use the two-step projection formula: the coefficient on xx in L(y|1, x1, ..., Xg) is 
the coefficient in L(|rx), say 71. But 

mı = Cov(rk,y)/Var(rz) = BxeCov(rx,x%)/Var(rx) 
since ex is uncorrelated with x1, ...,xx-1,xx%, and v by assumption and r% is uncorrelated with 
X1,...,Xx-1, by definition. Now Cov(rx,x%) = Var(rý) and Var(rx) = Var(rx) + Var(ex) 
[because Cov(rz, ex) = 0]. Therefore 7, is given by equation (4.47), which is what we wanted 
to show. 


4.11. Here is some Stata output obtained to answer this question: 


reg lwage exper tenure married south urban black educ iq kww 


Source | SS df MS Number of obs = 935 
ee hen F( 9, 925) = 37.28 
Model | 44.0967944 9 4.89964382 Prob > F = 0.0000 
Residual | 121.559489 925 .131415664 R-squared = 0.2662 
-------------+------------------------------ Adj R-squared = 0.2591 
Total | 165.656283 934 .177362188 Root MSE = 36251 

lwage | Coef. Std. Err. t P>|t | [95% Conf. Interval 
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exper | .0127522 . 0032308 3.95 0.000 .0064117 .0190927 
tenure | .0109248 . 0024457 4.47 0.000 .006125 .0157246 
married | . 1921449 . 0389094 4.94 0.000 .1157839 . 2685059 
south | -.0820295 .0262222 -3.13 0.002 - . 1334913 - .0305676 
urban | .1758226 .0269095 6.53 0.000 . 1230118 . 2286334 
black | -.1303995 .0399014 -3.27 0.001 - .2087073 - .0520917 
educ | .0498375 .007262 6.86 0.000 . 0355856 . 0640893 

iq | .0031183 .0010128 3.08 0.002 .0011306 .0051059 

kww | . 003826 .0018521 2.07 0.039 .0001911 .0074608 
cons | 5.175644 .127776 40.51 0.000 4.924879 5.426408 


( 1) iq=0 
( 2) kww=0 
F( 2, 925) = 8.59 
Prob > F = 0.0002 


a. The estimated return to education using both JQ and KWW as proxies for ability is about 
5%. When we used no proxy the estimated return was about 6.5%, and with only JQ as a proxy 
it was about 5.4%. Thus, we have an even lower estimated return to education, but it is still 
practically nontrivial and statistically very significant. 

b. We can see from the ¢ statistics that these variables are going to be jointly significant. 
The F test verifies this, with p-value = .0002. 

c. The wage differential between nonblacks and blacks does not disappear. Blacks are 
estimated to earn about 13% less than nonblacks, holding other factors in the regression fixed. 


d. Adding the interaction terms described in the problem gives the following results: 


sum iq kww 


Variable | Obs Mean Std. Dev. Min Max 

stm Se a il pha a les “a ai +-------------------------------------------------------- 
iq | 935 101.2824 15.05264 50 145 

kww | 935 35.74439 7.638788 12 56 


gen educigq® = educ*(iq - 100) 
gen educkwwO = educ*(kww - 35.74) 
reg lwage exper tenure married south urban black educ iq kww educiqO educkww0O 


Source | SS df MS Number of obs = 935 
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So Rec ase peeve oe eee eee ese F( 114, 923) = 31.48 
Model | 45.1916886 11 4.10833533 Prob > F = 0.0000 
Residual | 120.464595 923 .130514187 R-squared = 0.2728 
-------------+------------------------------ Adj R-squared = 0.2641 
Total | 165.656283 934 .177362188 Root MSE = .36127 

lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

je) pl i ec pa a +--------------------------------------------------------------- 
exper | .0121544 . 0032358 3.76 0.000 -005804 .0185047 
tenure | .0107206 . 0024383 4.40 0.000 .0059353 .015506 
married | .1978269 .0388272 5.10 0.000 .1216271 . 2140267 
south | -.0807609 .0261374 -3.09 0.002 - .1320565 - .0294652 
urban | .178431 .026871 6.64 0.000 .1256957 . 2311664 
black | -.1381481 .0399615 -3.46 0.001 -.2165741 -.0597221 
educ | -0452316 .0076472 5.91 0.000 .0302235 .0602396 
iq | . 0048228 .0057333 0.84 0.400 - .006429 .0160745 
kww | -.0248007 .0107382 -2.31 0.021 - .0458749 - .0037266 
educig® | -.0001138 .0004228 -0.27 0.788 - .0009436 .0007161 
educkwwO | .002161 .0007957 2.72 0.007 . 0005994 .0037227 
_cons | 6.080005 . 5610875 10.84 0.000 4.978849 7.18116 


test educiq® educkww0 


( 1) educigqd = 0 
( 2) educkwwO = 0 


F( 2, 923) 
Prob > F 


4.19 
0.0154 


The interaction educkww0 is statistically significant, and the two interactions are jointly 
significant at the 2% signifiance level. The estimated return to education at the average values 
of JO and KWW (in the population and sample, respectively) is somewhat smaller now: about 
4.5%. Further, as KWW increases above its mean, the return to education increases. For 
example, if KWW is about one standard deviation (7.64) above its mean, the return to 
education is about .045 +.0022(7.6) =.06172, or about 6.2%. So “knowledge of the world of 
work” interacts positively with education levels. 


4.12. Here is the Stata output when union is added to both equations: 


reg lscrap grant union if d88 


Source | SS df MS Number of obs = 54 
Doreen Shes Prope ersscen sles ORS eee Se ee F( 2, 51) = 1.16 
Model | 4.59902319 2 2.29951159 Prob > F = 0.3204 
Residual | 100.763637 51 1.97575759 R-squared = 0.0436 
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53 1.98797472 


105.36266 


Adj R-squared 
Root MSE 


- .0276192 
- 6222888 
. 2307292 


. 4043649 
. 4096347 
. 2648551 


0.946 
0.135 
0.388 


- .8394156 
- . 2000873 
- . 3009896 


. 7841772 
1.444665 
. 762448 


reg lscrap 


Source 


Model 


92.7289733 
12.6336868 


Number of obs = 
50) = 


_cons 


- .2851103 
. 2580653 
.8210298 

- .0477754 


df MS 
3 30.9096578 

50 .252673735 

53 1.98797472 
Std. Err t 
.1452619 -1.96 
.1477832 1.75 

.043962 18.68 
.0958824 -0.50 


F( 3, 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 

[95% Conf. 

-.5768775 

- .0387659 

. 7327295 

- .2403608 


The basic story does not change: initially, the grant is estimated to have essentially no 


effect, but adding log(scrap-1) gives the grant a strong effect that is marginally statistically 


significant. Interestingly, unionized firms are estimated to have larger scrap rates; over 25% 


more in the second equation. The effect is significant at the 10% level. 


4.13. a. Using the 90 counties for 1987 gives 


reg lcrmrte lprbarr lprbconv lprbpris lavgsen if d87 


Source 


Model 


11.1549601 
15.6447379 


26.799698 


Number of obs = 
85) = 


| 
Ko) 
© 


lprbarr 
lprbconv 
lprbpris 
lavgsen 


- . 7239696 
- .4725112 
. 1596698 
.0764213 


df MS 

4 2.78874002 

85 . 18405574 

89 = .301120202 
Std. Err t 
. 1153163 -6.28 
.0831078 -5.69 
. 2064441 0.77 
. 1634732 0.47 
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F( 4, 
Prob > F 
R-squared 
Adj R-squared = 
Root MSE 

[95% Conf. 

- .9532493 

- .6377519 

- .2507964 

- .2486073 


- .4946898 
- .3072706 
.570136 

. 4014499 


_cons | -4.867922 . 4315307 -11.28 0.000 -5.725921 -4.009923 


Because of the log-log functional form, all coefficients are elasticities. The elasticities of 
crime with respect to the arrest and conviction probabilities are the sign we expect, and both 
are practically and statistically significant. The elasticities with respect to the probability of 
serving a prison term and the average sentence length are positive but are statistically 
insignificant. 


b. To add the previous year’s crime rate we first generate the first lag of /crmrte: 


xtset county year 
panel variable: county (strongly balanced) 
time variable: year, 81 to 87 
delta: 1 unit 


gen lcrmrte_1 = L.lcrmrte 
(90 missing values generated) 


reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmrte_1 if d87 


Source | SS df MS Number of obs = 90 
sie er At ae Seas Sethe Seen tek ae Shee ae F( 5, 84) = 113.90 
Model | 23.3549731 5 4.67099462 Prob > F = 0.0000 
Residual | 3.4447249 84 -04100863 R-squared = 0.8715 
-------------+------------------------------ Adj R-squared = 0.8638 
Total | 26.799698 89 .301120202 Root MSE = .20251 

lcrmrte | Coef Std. Err t P>|t | [95% Conf. Interval 

fe et pe Sine i at ee ge “el a +--------------------------------------------------------------- 
lprbarr | -.1850424 0627624 -2.95 0.004 - . 3098523 - .0602325 
lprbconv | -.0386768 0465999 -0.83 0.409 - . 1313457 .0539921 
lprbpris | -.1266874 0988505 -1.28 0.204 - .3232625 . 0698876 
lavgsen | -.1520228 .0782915 -1.94 0.056 -.3077141 . 0036684 
lcrmrte_1 | 7798129 0452114 17.25 0.000 - 6899051 . 8697208 
_cons | -.7666256 3130986 -2.45 0.016 -1.389257 -.1439946 


There are some notable changes in the coefficients on the original variables. The 
elasticities with respect to prbarr and prbconv are much smaller now, but still have signs 
predicted by a deterrent-effect story. The conviction probability is no longer statistically 
significant. Adding the lagged crime rate changes the signs of the elasticities with respect to 


prbpris and avgsen, and the latter is almost statistically significant at the 5% level against a 
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two-sided alternative (p-value = .056). Not surprisingly, the elasticity with respect to the 


lagged crime rate is large and very statistically significant. (The elasticity is also statistically 


less than unity.) 


c. Adding the logs of the nine wage variables gives the following: 


reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmrte_1 lwcon- lwloc if d87 


14 1.70570553 
75 „038930942 


Number of obs 
F( 14, 75) 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


90 
43.81 


- . 3038978 
- .1674273 
- .4195493 

- . 364317 

. 6396942 
- . 6386344 
- . 2034619 
- .2079525 
- .4749687 
- .0560619 
- .1375459 
- .1525615 
- .3732769 
- .6926951 
-7.692009 


- .0411265 
. 0306994 
- .0115614 
- .0277923 
. 8509887 
. 0686327 
. 3317244 
. 7153665 
. 3079171 
. 2815703 
. 3350201 
. 8248172 
. 4522947 
.6187241 
. 1069593 


BQ°QROODOOom 


Source | SS df 
Model | 23.8798774 
Residual | 2.91982063 
Total | 26.799698 
lcrmrte | Coef Std. Err 
et ie, at ea ip ea iY “el i + 
lprbarr | -.1725122 . 0659533 
lprbconv | -.0683639 .049728 
lprbpris | -.2155553 .1024014 
lavgsen | -.1960546 . 0844647 
lcrmrte_1 | . 1453414 .0530331 
lwcon | -.2850008 .17 75178 
lwtuc | . 0641312 . 134327 
lwtrd | . 253707 . 2317449 
lwfir | -.0835258 . 1964974 
lwser | .1127542 .0847427 
lwmfg | -0987371 . 1186099 
lwfed | . 3361278 . 2453134 
lwsta | . 0395089 .2072112 
lwloc | -.0369855 . 3291546 
cons | -3.792525 1.957472 
testparm lwcon-lwloc 
1) lwcon = 0 
2) lwtuc = 0 
3) lwtrd = 0 
4) Il1wfir =0 
5) lwser = 0 
6) lwmfg = © 
7) lwfed = 0 
8) lwsta = 0 
9) Ilwloc = 0 
F( 9, 75) = 1.50 
Prob > F = 0.1643 


The nine wage variables are jointly insignificant even at the 15% level. Plus, the elasticities 
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are not consistently positive or negative. The two largest elasticities — which also have the 
largest absolute ¢ statistics — have the opposite sign. These are with respect to the wage in 
construction ( —. 285) and the wage for federal employees (. 336). 


d. The following Stata output gives the heteroskedasiticity-robust F statistic: 
qui reg lcrmrte lprbarr lprbconv lprbpris lavgsen lcrmrte_1 lwcon- lwloc if 


testparm lwcon-lwloc 


( 1) Ilwcon = 0 
( 2) Ilwtuc = 0 
( 3) Jlwtrd = 0 
( 4) Ilwfir = 0 
( 5) Ilwser = 0 
( 6) Ilwmfg = 0 
( 7) lwfed = 0 
( 8) Ilwsta = 0 
( 9) Ilwloc = 0 
F( 9, 75) = 2.19 
Prob > F = 0.0319 


Therefore, we would reject the null at the 5% signifiance level. But we might hesitate to 
rely on asymptotic theory — which the heteroskedasticity-robust test requires — with N = 90 
and K = 15 parameters to estimate. (This heteroskedasticity-robust F statistic is the 
heteroskedasticity-robust Wald statistic divided by the number of restrictions being tested, 
which is nine in this example. The division by the number of restrictions turns the asymptotic 
chi-square statistic into one can be treated as having roughly an F distribution.) 

4.14. a. Before doing the regression, it is helpful to know some summary statistics for the 


variables of primary interest: 


sum stndfnl atndrte 


Variable | Obs Mean Std. Dev Min Max 
ah a a +-------------------------------------------------------- 
stndfnl | 680 . 0296589 .9894611 -3.308824 2.783613 
atndrte | 680 81.70956 17.04699 6.25 100 


Because the final exam score has been standardized, it has close to a zero mean and its 
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standard deviation is close to one. The values are not closer to zero and one, respectively, 
because the standardization was done with a larger data set that included students with missing 
values on other key variables. It might make sense to redefine the standardized test score using 
the mean and standard deviation in the sample of 680, but the effect should be minor. 

The regression that controls only for year in school in addition to attendance rate is as 


follows: 


reg stndfnl atndrte frosh soph 


Source | SS df MS Number of obs = 680 
Stipa ee then Sie Seay ae AE a Gia ieee Gere a oe ae F( 3, 676) = 6.74 
Model | 19.3023776 3 6.43412588 Prob > F = 0.0002 
Residual | 645.46119 676 .954824246 R-squared = 0.0290 
-------------+------------------------------ Adj R-squared = 0.0247 
Total | 664.763568 679 .979033237 Root MSE = 97715 
stndfnl | Coef Std. Err t P>|t | [95% Conf. Interval 
ba Ss i a i el ee +--------------------------------------------------------------- 
atndrte | .0081634 .0022031 3.71 0.000 . 0038376 .0124892 
frosh | -.2898943 .1157244 -2.51 0.012 -.5171168 - .0626719 
soph | -.1184456 . 0990267 -1.20 0.232 - .3128824 .0759913 
cons | -.5017308 .196314 -2.56 0.011 - .8871893 - .1162724 


If atndrte increases by 10 percentage points (say, from 75 to 85), the standardized test 
score is estimated to increase by about .082 standard deviations. 

b. Certainly there is a potential for self-selection. The better students may also be the ones 
attending lecture more regularly. So the positive effect of the attendance rate simply might 
capture the fact that better students tend to do better on exams. It is unlikely that controlling 
just for year in college (frosh and soph) solves the endogeneity of atndrete. 


c. Adding priGPA and ACT gives 


reg stndfnl atndrte frosh soph priGPA ACT 


Source | SS df MS Number of obs = 680 
ee rere F( 5, 674) = 34.93 
Model | 136.801957 5 27.3603913 Prob > F = 0.0000 
Residual | 527.961611 674 .783325833 R-squared = 0.2058 
-------------+------------------------------ Adj R-squared = 0.1999 


Total | 664.763568 


679 .979033237 


Root MSE 


stndfnl | Coef Std. Err t P>|t | [95% Conf. Interval 

ase" al a a eh ee ie +--------------------------------------------------------------- 
atndrte | .0052248 . 0023844 2.19 0.029 .000543 .0099065 
frosh | -.0494692 .1078903 -0.46 0.647 - .2613108 .1623724 
soph | -.1596475 .0897716 -1.78 0.076 - .3359132 .0166181 
priGPA | -4265845 . 0819203 5.21 0.000 . 2657348 .5874343 
ACT | . 0844119 .0111677 7.56 0.000 . 0624843 . 1063395 
cons | -3.297342 . 308831 -10.68 0.000 -3.903729 -2.690956 


The effect of atndrte has fallen, which is what we expect if we think better, smarter 
students also attend lectures more frequently. The estimate now is that a 10 percentage point 
increase in atndrte increases the standardized test score by .052 standard deviations; the effect 
is statistically significant at the usual 5% level against a two-sided alternative, but the ¢ statistic 
is much lower than in part a. The strong positive effects of prior GPA and ACT score are also 
expected. 

d. Controlling for priGPA and ACT causes the sophomore effect (relative to students in 
year three and beyond) to get slightly larger in magnitude and more statistically significant. 
These data are for a course taught in the second term, so each frosh student does have a prior 
GPA — his or her GPA for the first semester in college. Adding priGPA in particular causes the 
“freshman effect” to essentially disappear. This is not too surprising because the average prior 
GPA for first-year students is notably less than the overall average priGPA. 

e. Here is the Stata session for adding squares in the proxy variables. Because we are not 
interested in the effects of the proxies, we do not demean them before creating the squared 


terms: 
gen priGPAsq = priGPA‘2 
gen ACTsq = ACTA2 
reg stndfnl atndrte frosh soph priGPA ACT priGPAsq ACTsq 


Source | SS df MS Number of obs 
wenn nnn a fan nnn nn nee eee F( 7, 672) 


680 
28.94 


Model | 153.974309 7 21.9963299 Prob > F = 0.0000 
Residual | 510.789259 672 .760103064 R-squared = 0.2316 
------------- +------------------------------ Adj R-squared = 0.2236 
Total | 664.763568 679 .979033237 Root MSE = 87184 
stndfnl | Coef Std. Err t P>|t | [95% Conf. Interval 
aian ee a Se a feet a a +--------------------------------------------------------------- 
atndrte | . 0062317 .0023583 2.64 0.008 .0016011 .0108623 
frosh | -.1053368 . 1069747 -0.98 0.325 - . 3153817 .1047081 
soph | -.1807289 . 0886354 -2.04 0.042 - .3547647 - .0066932 
priGPA | -1.52614 4739715 -3.22 0.001 -2.456783 - .5954966 
ACT | -.1124331 098172 -1.15 0.253 - .3051938 0803276 
priGPAsq | . 3682176 0889847 4.14 0.000 . 1934961 5429391 
ACTsq | 0041821 0021689 1.93 0.054 - .0000766 0084408 
cons | 1.384812 1.239361 1.12 0.264 -1.048674 3.818298 


Adding the squared terms — one of which is very significant, the other of which is 
marginally significant — actually increases the attendance rate effect. And it does so while 
slightly reducing the standard error on atndrte, resulting in a t statistic that is notably more 
significant than in part c. 


f. Adding the squared attendance rate is not warranted, as it is very insignificant: 


gen atndrtesq = atndrte’2 
reg stndfnl atndrte frosh soph priGPA ACT priGPAsq ACTsq atndrtesq 
Source | SS df MS Number of obs = 680 
bh cca Gra fe Bini ig E esta ek er Bini eee Sees F( 8, 671) = 25.28 
Model | 153.975323 8 19.2469154 Prob > F = 0.0000 
Residual | 510.788245 671 .761234344 R-squared = 0.2316 
-------------+------------------------------ Adj R-squared = 0.2225 
Total | 664.763568 679 .979033237 Root MSE = 87249 
stndfnl | Coef Std. Err t P>|t | [95% Conf. Interval 
el ak a at re ae ia a “el i ae +--------------------------------------------------------------- 
atndrte | .0058425 .0109203 0.54 0.593 - .0155996 .0272847 
frosh | -.1053656 . 1070572 -0.98 0.325 - .3155729 . 1048418 
soph | -.1808403 . 0887539 -2.04 0.042 - . 355109 - .0065716 
priGPA | -1.524803 -475737 -3.21 0.001 -2.458915 - .5906902 
ACT | -.1123423 .0982764 -1.14 0.253 - .3053087 . 080624 
priGPAsq | . 3679124 .0894427 4.11 0.000 .192291 . 5435337 
ACTsq | . 0041802 .0021712 1.93 0.055 - .0000829 . 0084433 
atndrtesq | 2.87e-06 .0000787 0.04 0.971 - .0001517 .0001574 
_cons | 1.394292 1.267186 1.10 0.272 -1.093835 3.88242 


The very large increase in the standard error on atndrte suggest that atndrte and atndrte? 
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are highly collinear. In fact, their sample correlation is about .983. Importantly, the coefficient 
on atndrte now has an uninteresting interpretation: it measures the partial effect of atndrte 
starting from atndrte = 0. The lowest attendance rate in the sample is 6.25, with the vast 
majority of students (94.3%) attending 50 percent or more of the lectures. If the quadratic term 
were significant, we might want to center atndrte about its mean or median before creating the 
square. Or, a more sophisticated functional form might be called for. It may be better to define 
several intervals for atndrrte and include dummy variables for those intervals. 

4.15. a. Because each x; has finite second moment, Var(xB) < œ. Since Var(u) < œ, 
Cov(xB, u) is well-defined. But each x; is uncorrelated with u, so Cov(xB,u) = 0. Therefore, 
Var(y) = Var(xB) + Var(w) or of = Var(xB) + 0. 

b. This is nonsense when we view x; as a random draw along with y;. The statement 
“Var(u;) = o? = Var(y;) for all i” assumes that the regressors are nonrandom (or B = 0, which 
is not a very interesting case). This is another example of how the assumption of nonrandom 
regressors can lead to counterintuitive conclusions. Suppose that an element of the error term, 
say z, which is uncorrelated with each x;, suddenly becomes observed. When we add z to the 
regressor list, the error changes, and so does the error variance. In the vast majority of 
economic applications, it makes no sense to think we have access to the entire set of factors 
that one would ever want to control for, and so we should allow for error variances to change 
across different sets of explanatory variables that we might use for the same response variable. 
We avoid trouble by focusing on joint distributions in the population. 

c. Write R? = 1 — SSR/SST = 1 — (SSR/N)/(SST/N). Therefore, 
plim(R?) = 1 —plim[(SSR/N)/(SST/N)] = 1-[plim(SSR/N)]/[plim(SST/N)] = 1- 02/0; = 


where we use the fact that SSR/N is a consistent estimator of o2 and SST/N is a consistent 
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estimator of o2. 

d. The derivation in part c assumed nothing about Var(u|x). The population R-squared 
depends on only the unconditional variances of u and y. Therefore, regardless of the nature of 
heteroskedasticity in Var(u|x), the usual R-squared consistently estimates the population 
R-squared. Neither R-squared nor the adjusted R-squared has desirable finite-sample 
properties, such as unbiasedness, so the only analysis we can given in any generality involves 
asymptotics. The statement in the problem is simply wrong. 

4.16. a. The proof is fairly similar to that for random sampling. First, note that the 
assumptions M~t ye [xix ~E(x!x;)] 4 0-— which is how the WLLN is stated for i.n.i.d. 
sequences — and N“! pay E(x;x;) > A — which is not crucial but is pretty harmless and 


simplifies the proof — imply 
N 
N > xx; 5 A 
i=1 


In addition, E(x/u;) = 0 and the assumption that N~! > xju; satisfies the law of large 


numbers imply 
N 
N ` xiu; > 0. 
i=1 


We are also given that A is positive definite, which means X'X/N is invertible with probability 


approaching one and (X'X/N)-! 5 A7!. Therefore, 
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N -1 N 
G Dx | N°! Dae | 
i=1 


N -1 N 
G 2 “x in D <n ) 
i=1 i=1 


b. Because N"? S Xiu; £ Normal(0, B), the sequence N”? Sa xu; is Op(1). We 


already used in part a that 


N —1 
G yx - A™ = 0,(1) 
i1 


Now, as in the i.i.d. case, write 
N =i N 
JN (B-8) = G Ds va ) (x 2 <n ) 
i=1 i=1 


N -1 N N 
= ( 1 > vx] — A7! l (x >D xiu ) + A7! (x > xiu) 
i=l i=1 i=1 


N 
= 0,(1)+O,(1) + A” (x Eriu ) 
i=1 


$ Normal (0, A-'BA“') 


, d , , 
where we use the assumption N"? > xju; > Normal(0, B). The asymptotic variance of 


JN (Î — B) has the usual sandwich form, ABA. 


c. We already know that 
N 
N > xix; 5 A. 
i=1 


Further, by the WLLN and the assumption that By > B, 


35 


N1! Lis t 


The hard part — just as with the i.i.d. case — is to show that replacing the u; with the OLS 
residuals, ;, does not affect consistency. Nevertheless, under general assumptions it follows 


that 


N1 3 ú?x!x; 5 


Naturally, we can use the same degrees-of-freedom adjustment as in the i.i.d. case: replace N~! 
with (V— K). 

d. The point of this exercise is that we are led to exactly the same heteroskedasticity-robust 
estimator whether we assume i.i.d. observations or i.n.i.d. observations. In particular, even if 
unconditional variances are constant — as they must be in the i.i.d. case — we still might need 
heteroskedasticity-robust standard errors. In the i.n.i.d. case, the robust variance matrix 
estimator allows for changing unconditional variances as well as conditional variances that 
depend on x;. 


4.17. We know that, in general, 
Avar JN (B — B) = [E(x'x)] “E(v2x'x)[E(x'x)] + 
Now we just apply iterated expecations to the matrix in the middle: 
E(u2x'x) = E[E(w2x'x|x)] = E[E(u?|x)x'x] = E[A(x)x’x] 
4.18. a. This is a fairly common misconception — or at least misstatement. Recall that the 


distribution of any random draw, u;, is the population distribution of u. But, of course, the 


population distribution of u is what it is; it does not change with the sample size. In fact, it has 
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nothing to do with the sample size. Therefore, the random draws on u; have the same 
distribution regardless of N. A correct statement is that the standardized average of the errors, 


Ne ee ui = 4N ū approaches normality as N > œ. This is a much different statement. (In 
regression analysis, we use the fact that N-17 23 xju; generally converges to a multivariate 
normal distribution, which implies the convergence of N"? ae u; to normality when x; 
contains unity.) 

b. It is tempting but incorrect to think that a single squared OLS residual can consistently 
estimate a conditional mean, E(u?|x;) = h(x;), but there is no sense in which this statement is 
true. It is not even clear what we would mean by it, but we can make some headway by writing 
ii? = u? — 2u;x;(B — B) + [x:(B — B)]?. Now, we can conclude ú? — u? 4 0 and N > œ because 
B = B. But remember u? = /(x;) + v; where E(v;|x;) = 0. There is no sense in which u? is a 
consistent estimator of h(x;); they do not even depend on the sample size N. 

It was the view that we needed ú? to be a good estimate of E(u?|x;) that possibly held up 
progress on heteroskedasticity-consistent covariance matrices. Fortunately, all we need to 


consistent estmate is the population mean 
B = E(u?x'x), 


for which the obvious consistent (and unbiased) estimator is 


N 
N ` u?x’Xi. 
i=1 


The rest is demonstrating the replacing the implicit B with B preserves consistency (not 
unbiasedness). As we know, this requires some tricky algebra with o,(1) and O,(1), but the 


work is not too onerous. 
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Solutions to Chapter 5 Problems 
5.1. Define x1 = (21,2) and x2 = V2, and let B = Ê, ĝa) be OLS estimator from (5.52), 
where B, = (ĝi; ĉ1ı). Using the hint, Bican also be obtained by partitioned regression: 


(i) Regress x; onto 2 and save the residuals, say X1. 


(ii) Regress yı onto X1. 

But when we regress z;onto 2 the residuals are just zı because 2 is orthogonal in sample 
to z. (More precisely, ee Z,;0;2 = 0.) Further, because we can write y2 = Pz + f2, where p2 
and 2 are orthogonal in sample, the residuals from regressing y2 onto 2 are simply the first 
stage fitted values, 2. In other words, X; = (Z1,2). But the 2SLS estimator of B, is obtained 
exactly from the OLS regression yı on Z1, V2. 

5.2. a. Unobserved factors that tend to make an individual healthier also tend to make that 
person exercise more. For example, if health is a cardiovascular measure, people with a history 
of heart problems are probably less likely to exercise. Unobserved factors such as prior health 
or family history are contained in u1, and so we are worried about correlation between exercise 
and u1. Self-selection into exercising predicts that the benefits of exercising will be, on 
average, overestimated. Ideally, the amount of exercise could be randomized across a sample 
of people, but this can be difficult. 

b. If people do not systematically choose the location of their homes and jobs relative to 
health clubs based on unobserved health characteristics, then it is reasonable to believe that 
disthome and distwork are uncorrelated with u;. But the location of health clubs is not 
necessarily exogenous. Clubs may tend to be built near neighborhoods where residents have 


higher income and wealth, on average, and these factors can certainly affect overall health. It 
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may make sense to choose residents from neighborhoods with very similar characteristics but 
where one neighborhood is located near a health club. 


c. The reduced form for exercise is 


exercise = To + Miage + N2weight + m3height 


+ a4male + nswork + m¢disthome + 17distwork + u1, 


For identification we need at least one of me and 77 to be different from zero. this 
assumption can fail if the amount that people exercise is not systematically related to distances 
to the nearest health club. 

d. An F test of Ho : ze = 0,27 = 0 is the simplest way to test the identification assumption 
in part c. As usual, it would be a good idea to compute a heteroskedasticity-robust version. 

5.3. a. There may be unobserved health factors correlated with smoking behavior that 
affect infant birth weight. For example, women who smoke during pregnancy may, on average, 
drink more coffee or alcohol, or eat less nutritious meals. 

b. Basic economics says that packs should be negatively correlated with cigarette price, 
although the correlation might be small (especially because price is aggregated at the state 
level). At first glance it seems that cigarette price should be exogenous in equation (5.54), but 
we must be a little careful. One component of cigarette price is the state tax on cigarettes. 
States that have lower taxes on cigarettes may also have lower quality of health care, on 
average. Quality of health care is in u, and so maybe cigarette price fails the exogeneity 
requirement for an IV. 


c. OLS is followed by 2SLS (IV, in this case): 


reg lbwght male parity lfaminc packs 


Source | SS df MS Number of obs = 1388 
wenn nn -e----- panne enn nee eee eee F( 4, 1383) = 12.55 
Model | 1.76664363 4 .441660908 Prob > F = 0.0000 
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Residual 


48 .65369 


50. 4203336 


R-squared 
Adj R-squared 
Root MSE 


male 
parity 
1lfaminc 


0147292 
. 0180498 
- .0837281 
4.675618 


. 0064486 
.0036171 
.0070964 
- .1173139 
4.632694 


0460328 
.0258414 
0290032 
- .0501423 
4.718542 


ivreg lbwght male parity lfaminc (packs = 


Instrumental variables (2SLS) regression 


Source 


Model 


-91.350027 
141.770361 


Number of obs = 


. 7971063 
0298205 
- .0012391 
. 063646 
4.467861 


1383 .035179819 

1387 .036352079 
Std. Err t P>|t | 
. 0100894 2.60 0.009 
. 0056646 2.60 0.009 
. 0055837 3.23 0.001 
.0171209 -4.89 0.000 
.0218813 213.68 0.000 
cigprice) 

df MS 

4 -22.8375067 

1383 .102509299 

1387 .036352079 
Std. Err t P>|t | 
1.086275 0.73 0.463 
.017779 1.68 0.094 
.0219322 -0.06 0.955 
.0570128 1.12 0.264 
. 2588289 17.26 0.000 


F( 4, 1383) = 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 

[95% Conf. 

-1.333819 

- .0050562 

- .044263 

- .0481949 

3.960122 


2.928031 
. 0646972 
.0417848 
. 1754869 
4.975601 


Instrumented: 
Instruments: 


packs 


male parity lfaminc cigprice 


The difference between OLS and IV in the estimated effect of packs on bwght is huge. 


With the OLS estimate, one more pack of cigarettes is estimated to reduce bwght by about 


8.4%, and is statistically significant. The IV estimate has the opposite sign, is huge in 


magnitude, and is not statistically significant. The sign and size of the smoking effect are not 


realistic. 


d. We can see the problem with IV by estimating the reduced form for packs. 


reg packs male parity lfaminc cigprice 


Source 
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Number of obs = 
1383) = 


F( 4, 


Model | 3.76705108 4 .94176277 Prob > F = 0.0000 


Residual | 119.929078 1383 .086716615 R-squared = 0.0305 
------------- +------------------------------ Adj R-squared = 0.0276 
Total | 123.696129 1387 .089182501 Root MSE = .29448 

packs | Coef Std. Err t P>|t | [95% Conf. Interval 

atar aa a ae ee ee ee ae +--------------------------------------------------------------- 
male | -.0047261 .0158539 -0.30 0.766 - .0358264 .0263742 
parity | .0181491 . 0088802 2.04 0.041 .0007291 . 0355692 
lfaminc | -.0526374 . 0086991 -6.05 0.000 - .0697023 - .0355724 
Cigprice | .000777 .0007763 1.00 0.317 - .0007459 . 0022999 
_cons | . 1374075 . 1040005 1.32 0.187 - .0666084 . 3414234 


The reduced form estimates show that cigprice does not significantly affect packs. In fact, 
the coefficient on cigprice does not have the sign we expect. Thus, cigprice fails as an IV for 
packs because cigprice is not partially correlated with packs with a sensible sign for the 
correlation. This is separate from the problem that cigprice may not truly be exogenous in the 
birth weight equation. 


5.4. a. Here are the OLS results: 


reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 


Source | SS df MS Number of obs = 3010 

E E eae ere ea E E E F( 15, 2994) = 85.48 
Model | 177.695591 15 11.8463727 Prob > F = 0.0000 
Residual | 414.946054 2994 .138592536 R-squared = 0.2998 
-------------+------------------------------ Adj R-squared = 0.2963 
Total | 592.641645 3009 .196956346 Root MSE = .37228 

lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

sh “ae, a aap ey a ae +--------------------------------------------------------------- 
educ | -0746933 . 0034983 21.35 0.000 . 0678339 .0815527 
exper | . 084832 .0066242 12.81 0.000 .0718435 .0978205 
expersq | - .002287 . 0003166 -7.22 0.000 - .0029079 - .0016662 
black | -.1990123 0182483 -10.91 0.000 - .2347927 - .1632318 
south | -.147955 .0259799 -5.69 0.000 - .1988952 - .0970148 
smsa | . 1363845 .0201005 6.79 0.000 .0969724 .1757967 
reg661 | -.1185698 . 0388301 -3.05 0.002 - .194706 - .0424335 
reg662 | -.0222026 .0282575 -0.79 0.432 - .0776088 0332036 
reg663 | .0259703 .0273644 0.95 0.343 - .0276846 .0796251 
reg664 | -.0634942 . 0356803 -1.78 0.075 - . 1334546 .0064662 
reg665 | .0094551 .0361174 0.26 0.794 - .0613623 .0802725 
reg666 | .0219476 . 0400984 0.55 0.584 - .0566755 . 1005708 
reg667 | -.0005887 .0393793 -0.01 0.988 -.077802 .0766245 
reg668 | -.1750058 . 0463394 -3.78 0.000 - . 265866 - .0841456 
smsa66 | .0262417 .0194477 1.35 0.177 - .0118905 . 0643739 
_cons | 4.739377 .0715282 66.26 0.000 4.599127 4.879626 
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The estimated return to education is about 7.5%, with a very large ¢ statistic. These 
reproduce the estimates from Table 2, Column (2) in Card (1995). 


b. The reduced form for educ is 


reg educ exper expersq black south smsa reg661-reg668 smsa66 nearc4 


Source | SS df MS Number of obs = 3010 
Se anae Uae ee pho eee es ce ae nite = mea tees F( 15, 2994) = 182.13 
Model | 10287.6179 15 685.841194 Prob > F = 0.0000 
Residual | 11274.4622 2994 3.76568542 R-squared = 0.4771 
-------------+------------------------------ Adj R-squared = 0.4745 
Total | 21562.0801 3009 7.16586243 Root MSE = 1.9405 

educ | Coef Std. Err t P>|t | [95% Conf. Interval 

Se a et, et Vee aa pe “Nag ae +--------------------------------------------------------------- 
exper | -.4125334 .0336996 -12.24 0.000 -.4786101 - . 3464566 
expersq | . 0008686 .0016504 0.53 0.599 - .0023674 .0041046 
black | -.9355287 .0937348 -9.98 0.000 -1.11932 -. 7517377 
south | -.0516126 . 1354284 -0.38 0.703 - .3171548 . 2139296 
smsa | .4021825 . 1048112 3.84 0.000 .1966732 .6076918 
reg661 | -.210271 . 2024568 -1.04 0.299 - .6072395 .1866975 
reg662 | -.2889073 .1473395 -1.96 0.050 -.5778042 - .0000105 
reg663 | -.2382099 . 1426357 -1.67 0.095 - .5178838 . 0414639 
reg664 | - .093089 . 1859827 -0.50 0.617 - .4577559 .2715779 
reg665 | -.4828875 . 1881872 -2.57 0.010 - .8518767 - .1138982 
reg666 | -.5130857 . 2096352 -2.45 0.014 - .9241293 - . 1020421 
reg667 | -.4270887 . 2056208 -2.08 0.038 - .8302611 - .0239163 
reg668 | . 3136204 . 2416739 1.30 0.194 - .1602434 . 7874841 
smsa66 | .0254805 .1057692 0.24 0.810 -.1819071 . 2328682 
nearc4 | . 3198989 .0878638 3.64 0.000 .1476194 .4921785 
_cons | 16.84852 .2111222 79.80 0.000 16.43456 17.26248 


The important coefficient is on nearc4. Statistically, educ and nearc4 are partially 
correlated, and in a way that makes sense: holding other factors in the reduced form fixed, 
someone living near a four-year college at age 16 has, on average, almost one-third a year 
more education than a person not near a four-year college at age 16. This is not trivial a effect, 
so nearc4 passes the requirement that it is partially correlated with educ. 


c. Here are the IV estimates: 


ivreg lwage exper expersq black south smsa reg661-reg668 smsa66 (educ = nearc4 
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Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 3010 
ante eonlt Bin aie Ste qi eerie aie re eran aye ele ar ee F( 15, 2994) = 51.01 
Model | 141.146813 15 9.40978752 Prob > F = 0.0000 
Residual | 451.494832 2994 .150799877 R-squared = 0.2382 
-------------+------------------------------ Adj R-squared = 0.2343 
Total | 592.641645 3009 .196956346 Root MSE = .38833 

lwage | Coef Std. Err. t P>|t | [95% Conf. Interval 

ia ose i il Ss et pe “et le +--------------------------------------------------------------- 
educ | . 1315038 .0549637 2.39 0.017 .0237335 . 2392742 
exper | . 1082711 .0236586 4.58 0.000 .0618824 . 1546598 
expersq | -.0023349 . 0003335 -7.00 0.000 - .0029888 - .001681 
black | -.1467757 .0538999 -2.72 0.007 - . 2524603 - .0410912 
south | -.1446715 .0272846 -5.30 0.000 -.19817 - .091173 
smsa | . 1118083 -031662 3.53 0.000 .0497269 .1738898 
reg661 | -.1078142 .0418137 -2.58 0.010 - . 1898007 - .0258278 
reg662 | -.0070465 0329073 -0.21 0.830 -.0715696 .0574767 
reg663 | -0404445 . 0317806 1.27 0.203 - .0218694 . 1027585 
reg664 | -.0579172 .0376059 -1.54 0.124 - .1316532 .0158189 
reg665 | . 0384577 .0469387 0.82 0.413 - .0535777 . 130493 
reg666 | .0550887 .0526597 1.05 0.296 - .0481642 . 1583416 
reg667 | .026758 .0488287 0.55 0.584 - .0689832 .1224992 
reg668 | -.1908912 .0507113 -3.76 0.000 - .2903238 - .0914586 
smsa66 | .0185311 0216086 0.86 0.391 - .0238381 . 0609003 
_cons | 3.773965 . 934947 4.04 0.000 1.940762 5.607169 


Instrumented: educ 
Instruments: exper expersq black south smsa reg661 reg662 reg663 reg664 
reg665 reg666 reg667 reg668 smsa66 nearc4 


The estimated return to education has increased to about 13.2%, but notice how wide the 
95% confidence interval is: 2.4% to 23.9%. By contrast, the OLS confidence interval is about 
6.8% to 8.2%, which is much tighter. Of course, OLS could be inconsistent, in which case a 
tighter CI is of little value. But the estimated return to education is higher with IV, something 
that seems a bit counterintuitive. 

One possible explanation is that educ suffers from classical errors-in-variables. Therefore, 
while OLS would tend to overestimate the return to schooling because of omitted “ability,” 
classical measurement error in educ leads to an attenuation bias. Measurement error may help 
explain why the IV estimate is larger, but it is not entirely convincing. It seems unlikely that 


educ satisfies the CEV assumptions. For example, if we think the measurement error is due to 
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truncation — people are asked about highest grade completed, not actual years of schooling — 
then educ is always less than or equal to educ*. And the measurement error could not be 
independent of educ*. If we think the mismeasurement is due to is unobserved quality of 
schooling, it seems likely that quality of schooling — part of the measurement error — is 
positively correlated with actual amount of schooling. This, too, violates the CEV assumptions. 
Another possibility for the much higher IV estimate comes out of the recent treatment effect 
literature, which is covered in Section 21.4. Of course, we must also remember that the point 
estimates — particularly the IV estimate — are subject to substantial sampling variation. At this 
point, we do not even know of OLS and IV are statistically different from each other. See 
Problem 6.1. 

d. When nearc2 is added to the reduced form of educ it has a coefficient (standard error) of 
.123 (.077), compared with .321 (.089) for nearc4. Therefore, nearc4 has a much stronger 
ceteris paribus relationship with educ; nearc2 is only marginally statistically significant once 
nearc4 has been included. The joint F test gives F = 7.89 with p-value = . 004. 

The 2SLS estimate of the return to education becomes about 15.7%, with 95% CI given by 
5.4% to 26%. The CI is still very wide. 


5.5. Under the null hypothesis that qand z, are uncorrelated, zı and Z2 are exogenous in 


(5.55) because each is uncorrelated with u1. Unfortunately, y2 is correlated with u1, and so the 


regression of yı on Z1,V2,Z2 does not produce a consistent estimator of 0 on z, even when 
E(z2q) = 0. We could find that #, from this regression is statistically different from zero even 
when q and z, are uncorrelated — in which case we would incorrectly conclude that z, is not a 


valid IV candidate. Or, we might fail to reject Ho : y} = 0 when zzand q are correlated — in 
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which case we incorrectly conclude that the elements in z, are valid as instruments. 

The point of this exercise is that one cannot simply add instrumental variable candidates in 
the structural equation and then test for significance of these variables using OLS estimation. 
This is the sense in which identification cannot be tested: we cannot test whether all of the IV 
candidates are uncorrelated with g. With a single endogenous variable, we must take a stand 
that at least one element of z2 is uncorrelated with q. 


5.6. a. By definition, the reduced form is the linear projection 
L(qi|1,x,9g2) = To + XT1 + 12q2, 

and we want to show that nı = 0 when q, is uncorrelated with x. Now, because q, is a linear 
function of q and a, , and a, is uncorrelated with x, q, is uncorrelated with x if and only if q is 
uncorrelated with x. Assuming then that q and x are uncorrelated, q4 is also uncorrelated with 
x. A basic fact about linear projections is that, because qı and q2 are each uncorrelated with 
the vector x, mı = 0. This claim follows from Property LP.7: 1, can be obtained by first 
projecting x on 1, q2 and obtaining the population residuals, say r . Then, project gi onto r. But 
because x and q2 are orthogonal, r = x — p, Projecting q1 on (x — p,,) just gives the zero 
vector because E[(x — p,)'q1] = 0. Therefore, mı = 0. 

b. If q, and x are correlated then 1; + 0, and x appears in the reduced form for q1. It is not 
realistic to assume that q, and x are uncorrelated. Under the multiple indicator assumptions, 
assuming x and q2 are uncorrelated is the same as assuming q and x are uncorrelated. If we 
believe q and x are uncorrelated then there is no need to collect indicators on q to consistently 
estimate B: we could simply put q into the error term and estimate f from an OLS regression of 


y on 1, x. (Of course, if g and x are uncorrelated we could, in general, gain efficiency for 
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estimating B by including q as an extra regressor.) 


5.7. a. If we plug q = (1/61)q1 — (1/81 )aı into equation (5.45) we get 


y = Po + Bixı +...+PKXK +N1q1 +v- n141, (5.56) 
where 71 = (1/6, ) . Now, because the z, are redundant in (5.45), they are uncorrelated with 
the structural error, v (by definition of redundancy). Further, we have assumed that the z, are 
uncorrelated with a;. Since each x; is also uncorrelated with v — 71a; we can estimate (5.56) 
by 2SLS using instruments (1,x,,...,xx,Z1,Z2,--.,Zm) to get consistent of the p; and 71. 
Given all of the zero correlation assumptions, what we need for identification is that at 


least one of the z, appears in the reduced form for q; . More formally, in the linear projection 


qı = Mot W1X1 +... tT KXK + UK412Z1 +... +TK+MZM +11, 


at least one of 7 x1,...,7x+u must be different from zero. 

b. We need family background variables to be redundant in the /og(wage) equation once 
ability (and other factors, such as educ and exper), have been controlled for. The idea here is 
that family background may influence ability but should have no partial effect on Jog(wage) 
once ability has been accounted for. For the rank condition to hold, we need family 
background variables to be correlated with the indicator, qı say JQ, once the x; have been 
netted out. This is likely to be true if we think that family background and ability are 
(partially) correlated. 


c. Applying the procedure to the data set in NLS80.RAW gives the following results: 
. ivreg lwage exper tenure educ married south urban black (iq = meduc feduc sibs 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 722 
Derbide ee panne enn EEAS F( 8, 713) = 25.81 
Model | 19.6029198 8 2.45036497 Prob > F = 0.0000 
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Residual | 107.208996 713 =. 150363248 R-squared = 0.1546 


------------- +------------------------------ Adj R-squared = 0.1451 
Total | 126.811916 721 .175883378 Root MSE = .38777 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

sie a ins See ey i ese +--------------------------------------------------------------- 

iq | .0154368 .0077077 2.00 0.046 . 0003044 .0305692 

exper | .0162185 .0040076 4.05 0.000 . 0083503 .0240867 

tenure | -0076754 . 0030956 2.48 0.013 .0015979 .0137529 

educ | .0161809 .0261982 0.62 0.537 - .035254 .0676158 

married | . 1901012 .0467592 4.07 0.000 .0982991 . 2819033 

south | - .047992 .0367425 -1.31 0.192 - .1201284 .0241444 

urban | 1869376 0327986 5.70 0.000 .1225442 .2513311 

black | 0400269 1138678 0.35 0.725 - . 1835294 . 2635832 

cons | 4.471616 468913 9.54 0.000 3.551 5.392231 

Instrumented: iq 

Instruments: exper tenure educ married south urban black meduc feduc sibs 


ivreg lwage exper tenure educ married south urban black (kww = meduc feduc 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 722 
E E eee E F( 8, 713)= 25.70 
Model | 19.820304 8 2.477538 Prob > F = 0.0000 
Residual | 106.991612 713 =. 150058361 R-squared = 0.1563 
-------------+------------------------------ Adj R-squared = 0.1468 
Total | 126.811916 721 .175883378 Root MSE = .38737 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 
Sky Sm “it +--------------------------------------------------------------- 
kww | .0249441 .0150576 1.66 0.098 - .0046184 .0545067 
exper | . 0068682 .0067471 1.02 0.309 - .0063783 .0201147 
tenure | .0051145 .0037739 1.36 0.176 - .0022947 .0125238 
educ | .0260808 .0255051 1.02 0.307 - .0239933 .0761549 
married | . 1605273 .0529759 3.03 0.003 .0565198 . 2645347 
south | - .091887 .0322147 -2.85 0.004 - .1551341 - .0286399 
urban | . 1484003 .0411598 3.61 0.000 .0675914 . 2292093 
black | -.0424452 . 0893695 -0.47 0.635 -.2179041 . 1330137 
cons | 5.217818 . 1627592 32.06 0.000 4.898273 5.537362 
Instrumented: kww 
Instruments: exper tenure educ married south urban black meduc feduc sibs 


Even though there are 935 men in the sample, only 722 are used for the estimation because 
data are missing on meduc and feduc. 
The return to education is estimated to be small and insignificant whether JQ or KWW used 


is used as the indicator. This could be because family background variables do not satisfy the 
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appropriate redundancy condition, or they might be correlated with a1. (In both first-stage 
regressions, the F statistic for joint significance of meduc,feduc and sibs have p-values below 
.002, so it seems the family background variables have some partial correlation with the ability 
indicators.) 

5.8. a. Plug in the indicator qı for q and the measurement xx for xx, being sure to keep 


track of the errors: 


y = Yo + Bixit...+Bexx+yigitv—Prex+yiai, 
= yo + Piıxı +... +PKXxKk + Yıqı +U 


where y, = 1/61 Now, if the variables z1, ..., zm are redundant in the structural equation (so 
they are uncorrelated with v), and uncorrelated with the measurement error ex and the 
indicator error a; we can use these as IVs for xx and q1 in 2SLS. We need M > 2 because we 
have two explanatory variables, x, and qi, that are possibly correlated with the composite 
error u. 


b. The Stata results are: 
ivreg lwage exper tenure married south urban black (educ iq = kww meduc feduc 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 722 
E een ene oe eos e eee ee eee F( 8, 713) = 18.74 
Model | -.295429993 8 -.036928749 Prob > F = 0.0000 
Residual | 127.107346 713 «178271172 R-squared = 
-------------+------------------------------ Adj R-squared = 
Total | 126.811916 721 = .175883378 Root MSE = .42222 
lwage | Coef Std. Err. t P>|t | [95% Conf. Interval 
a ee a dee +--------------------------------------------------------------- 
educ | . 1646904 .1132659 1.45 0.146 - .0576843 .3870651 
iq | -.0102736 .0200124 -0.51 0.608 - .0495638 .0290166 
exper | .0313987 .0122537 2.56 0.011 .007341 .0554564 
tenure | -0070476 .0033717 2.09 0.037 . 0004279 .0136672 
married | . 2133365 .0535285 3.99 0.000 . 1082442 . 3184289 
south | -.0941667 . 0506389 -1.86 0.063 - .1935859 .0052525 
urban | .1680721 . 0384337 4.37 0.000 .0926152 . 2435289 
black | -.2345713 .2241568 -1.04 0.297 - .6758356 . 2066929 
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_cons | 4.932962 . 4870124 10.13 0.000 3.976812 5.889112 


Instrumented: educ iq 
Instruments: exper tenure married south urban black kww meduc feduc sibs 


The estimated return to education is very large, but imprecisely estimated. The 95% 
confidence interval is very wide, and easily includes zero. Interestingly, the coefficient on iq is 
actually negative, and not statistically different from zero. The large IV estimate of the return 
to education and the insignificant ability indicator lend some support to the idea that omitted 
ability is less of a problem than schooling measurement error in the standard log(wage) model 
estimated by OLS. But the evidence is not very convincing given the very wide confidence 
interval for the educ coefficient. 

5.9. Define 04 = ß4 — p3, so that P4 = B3 + 04. Plugging this expression into the equation 
and rearranging gives 


log(wage) = Bo + Biexper + Boexper® + B3(twoyr + fouryr) + O4fouryr + u 
= Po + Biexper + Boexper® + B3totcoll + O4fouryr + u, 


where fotcoll = twoyr + fouryr. Now, just estimate the latter equation by 2SLS using exper, 
exper? , dist2yr and dist4yr as the full set of instruments. We can use the f statistic on 64 to 
test Ho : 04 = 0 against Hı : 04 > 0. 
5.10. a. For Ba, the lower right hand element in the general formula (5.24) with x = (1, x) 
and z = (1,z) is 
o*[Cov(z,x)?/Var(z)]. 
Alternatively, you can derive this formula directly by writing 


N -1 N 
JN (Bı - Bi) = G aC HZ (xi -») a SG -2u | 


i=1 
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Now, p2 = [Cov(z,x)]?/(o202), so simple algebra shows that the asymptotic variance is 
o*/(p2,02). The asymptotic variance for the OLS estimator is o*/o2. Thus, the difference is the 
presence of p2, in the denominator of the IV asymptotic variance. 

b. Naturally, as the error variance o? increases so does the the asymptotic variance of the 
IV estimator. More variance in x in the population is better for estimating 81: as o2 increases 
the asymptotic variance decreases. These effects are identical to the findings for OLS. A larger 
correlation between z and x reduces the asymptotic variance of the IV estimator. As pa > 0 
the asymptotic variance increases without bound. This illustrates why an instrument that is 
only weakly correlated with x can lead to very imprecise IV estimators. 

5.11. Following the hint, let y} be the linear projection of y, on Z, , let a, be the projection 


error, and assume that A, is known. (The results on generated regressors in Section 6.1.1 show 


that the argument carries over to the case when A, is estimated.) Plugging in y2 = y8} + a2 gives 


yı = 210, +a1y$ + d1a2 + u1. 
Effectively, we regress yı on Z1,y$. The key consistency condition is that each explanatory is 
orthogonal to the composite error, @1a2 + u1. By assumption, E(z1u1) = 0. Further, 
E(y$a2) = 0 by construction. The problem is that, in general, E(z}a2) + 0 because z;was not 
included in the linear projection for y2. Therefore, OLS will be inconsistent for all parameters 


in general. Contrast this conclusion with 2SLS when y3 is the projection on zı and z2: 


y2 =y> +r = Zi +r? 
E(z'r2) = 0 


The second step regression (assuming that 2 is known) is 


yı = 2101 + 1y3 + ir. + U1. 
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By construction r2 is uncorrelated with z, and so E(zr2) = 0 and E(y3rz2) = 0. 

The lesson is that one must be very careful if manually carrying out 2SLS by explicitly 
doing the first- and second- stage regressions: all exogenous variables must be included in the 
first stage. 

5.12. This problem is essentially proven by the hint. Given the description of II, the only 
way the K columns of IT can be linearly dependent is if the last column can be written as a 
linear combination of the first K — 1 columns. This is true if and only if each 0; is zero. Thus, 
if at least one 0; is different from zero, rank(II) = K. 


5.13. a. In a simple regression model with a single IV, the IV estimate of the slope can be 


` N N 
Êi = (Le-a0-9] / (Ze-2e-2) 
x woo 
= (Z202) (Baca 


Now the numerator can be written as 


written as 


210i -D) = Eizy- (az) = Na -N = Nii — J) where Ni = YO, zi is 
the number of observations in the sample with z; = 1 and j is the average of the y; over the 


observations with z; = 1. Next, write y as a weighted average: 


Y = (No/N)¥o0 + (N1/N)Y 


where the notation should be clear. Straightforward algebra shows that 


Yi -y = [WN - N )/ NI — (No/N) V0 
= (No/N) 1 — Yo). 


Therefore, the numerator of the IV estimate is (NoN1/N) (91 — Yo). The same argument shows 


51 


that the denominator is (NoN1/N) (x1 — xo). Taking the ratio proves the result. 

b. If x is also binary — representing some “treatment” — x, is the fraction of observations 
receiving treatment when z; = 1 and Xo is the fraction receiving treatment when z; = 0. 
Suppose x; = 1 if person i participates in a job training program, and let z; = 1 if person is 
eligible for participation in the program. Then x; is the fraction of people participating in the 
program out of those made eligible, and x, is the fraction of people participating who are not 
eligible. (When eligibility is necessary for participation, x9 = 0.) Generally, x1 — Xo is the 
difference in participation rates when z = 1 and z = 0. So the difference in the mean response 
between the z = 1 andz = 0 groups gets divided by the difference in participation rates across 
the two groups. 

5.14. a. Taking the linear projection of (5.1) under the assumption that 


(X1,...,XK-1,Zi,...,Zm) are uncorrelated with u gives 


LO|z) = Bo + Bixit...+Bx1xK-1 + BeL(xx|z) + L(ulz) 


= Po + Bixit...+BK-1xK-1 + Bxxk 


because L(u|z) = 0. 
b. By the law of iterated projections, 
LOQ|1,%1,...,XK-1,X%) = Bo + Bix1 +... +BK-1XkK-1 + KXK. 
Consistency of OLS for the p; from the regression y on 1,x1, ...,xx-1,xx follows immediately 


from our treatment of OLS from Chapter 4: OLS consistently estimates the parameters in a 


linear projection provided there is not perfect collinearity in (1,x4 ,...,Xg-1 X% ). 


c. I should have said explicitly to assume E(z’z) is nonsingular — that is, 2SLS.2a holds. 


Then, x; is not a perfect linear combination of (x1, ... ,xx-1) if and only if at least one element 
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of z1,...,Z has nonzero coefficient in L(xx|1,x1,...*x-1,Z1,...,Zm). In the model with a 
single endogenous explanatory variable, we know this condition is equivalent to Assumption 
2SLS.2b, the standard rank condition. 


5.15. In L(x|z) = zII we can write 


I Ik, 


where Ix, is the K2 x Kə identity matrix, 0 is the Lı x Kə zero matrix, Ij; is L1 x Ki, and IIi2 
is K2 x K,. As in Problem 5.12, the rank condition holds if and only if rank (IT) = K. 

a. If for some x;, the vector zı does not appear in L(x;|z), then IT;,has a column which is 
entirely zeros. Then that column of IT can be written as a linear combination of the last K2 
columns of II — because any K2 x 1 vector in IT;2 can be written as a linear combination of the 
columns of Ix,. This implies rank(II) < K. Therefore, a necessary condition for the rank 
condition is that no columns of IT;; be exactly zero, which means that at least one z} must 
appear in the reduced form of each x;, j = 1,...,K1. 

b. Suppose Kı = 2 and L, = 2, where zı appears in the reduced form from both xı and x2, 
but z2 appears in neither reduced form. Then the 2 x 2 matrix II;; has zeros in its second row, 
which means that the second row of IT is all zeros. In that case, it cannot have rank K. 
Intuitively, while we began with two instruments, only one of them turned out to be partially 
correlated with x; and x2. 

c. Without loss of generality, assume that z; appears in the reduced form for x;; we can 
simply reorder the elements of z; to ensure this is the case. Then Ii; is a Kı x Kı diagonal 


matrix with nonzero diagonal elements. Looking at 
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I Ik, 


we see that if IIj; is diagonal with all nonzero diagonals then IT is lower triangular with all 
diagonal elements nonzero. Therefore, rank II = K. 


5.16. a. The discussion below equation (5.24) implies directly that 
Avar JN ($ - B) = 02/Var(w*) 


because there are no other explanatory variables — exogenous or endogenous — in the equation. 


Remember, the expression 
oi[E(x"'x*)]™ 
has the same form as that for OLS but with x* replacing x. So any algebra derived for OLS can 


be applied to 2SLS. 


b. We can write 
v= u-— hy, 
so if E(g'u) = 0 and E(g'h) = 0 then 
E(g'v) = E(g'u) - E(g'h)y = 0. 

c. For the hint here to be entirely correct, I should have stated that E(w) = 0. As we will 
see, when w has a nonzero mean, 7 differs from w* by an additive constant [which, of course, 
implies Var(7) = Var(w*)]. 

Again using the discussion following equation (5.24), 

Avar JN (È - B) = 02/Var(?), 


where øo? = Var(v), 7 is the population residual from the regression w on 1,h, and w are the 
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population fitted values from the linear projection of w on g, h. 


Because E(g'h) = 0, we can write 


w = gn, + hn 


where 
Tı = [E(g'g)] *E(g'w) 
m2 = [E(h’h)]'E(h'w). 
Note that 
w* = L(w|g) = gr}. 
Next, 


L(W|1,h) = L(g, + hr2|1,h) = L(g|1,h)rı + hr 
= L(g|1)m1 + ha 


because E(h) = 0 and E(g’h) = 0 are assumed. Now L(g|1) = E(g), and so 
L(@W|1,h) = nı + hn 
where 71 = „T1. Therefore, 


r=w-L(W|1,h) = (gn, + hr2) — (91 + haz) = -n1 + ga, 


= —Nı + w* 
It follows that Var(7) = Var(w*) and so we have shown 
Avar JN (B — B) = o?/Var(w*), 
d. Because E(h'v) = 0 by definition, we have 
2 


o2 = Var(hy) + o2 = y'Ery + 0? > 02, 


with strict inequality if Ł, is positive definite and y + 0 (and even in some cases where 
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Xn = Var(h) is not positive definite). This means that, asymptotically, we generally get a 
smaller asymptotic variance for estimate B by including exogenous variables that are 


uncorrelated with the instruments g: 


Avar J/N (B - p) - Avar JN (Ê - p) = wy z ws 
_ Yny 
~ Var(w*) 7 
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Solutions to Chapter 6 Problems 


6.1. a. Here is abbreviated Stata output for testing the null hypothesis that educ is 


exogenous: 


use card 


qui reg educ nearc4 nearc2 exper expersq black south smsa reg661-reg668 


smsa66 


predict v2hat, 


reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 


Source 


Model 


resid 


178.100803 
414.540842 2993 


Number of obs 
F( 16, 

Prob > F 
R-squared 
Adj R-squared 
Root MSE 


2993) 


v2hat 


3010 
= 80.37 


exper 
expersq 
black 
south 
smsa 
reg661 
reg662 
reg663 
reg664 
reg665 
reg666 


.1570594 
. 1188149 
- .0023565 
- .1232778 
- .1431945 
. 100753 
- .102976 
- .0002286 
0469556 
- .0554084 
0515041 
. 0699968 
. 0390596 
- .1980371 
.0150626 
- .0828005 
3.339687 


df MS 
16 11.1313002 
. 138503455 
. 196956346 

Std. Err t 
.0482814 3.25 
.0209423 5.67 
.0003191 -7.38 
.0478882 -2.57 
.0261202 -5.48 
.0289435 3.48 
.0398738 -2.58 
.0310325 -0.01 
.0299809 1.57 
.0359807 -1.54 
.0436804 1.18 
.0489487 1.43 
.0456842 0.85 
.0482417 -4.11 
.0205106 0.73 
.0484086 -1.71 

.821434 4.07 


.0623912 
.0777521 
- .0029822 
-.2171749 
- . 1944098 
.0440018 
- .1811588 
- .0610759 
- .0118296 
- .1259578 
- .0341426 
- .0259797 
- .050516 
- .2926273 
- .0251538 
-.177718 
1.729054 


.2517275 
. 1598776 
- .0017308 
- .0293806 
- .0919791 
1575042 
- .0247932 
. 0606186 
. 1057408 
.0151411 
1371509 
. 1659734 
. 1286352 
- . 1034468 
.0552789 
.0121169 
4.950319 


The ¢ statistic on ¥2 is —1.71, which is not significant at the 5% level against a two-sided 


alternative. The negative correlation between u, and educ is essentially the same finding that 


the 2SLS estimated return to education is larger than the OLS estimate. In any case, I would 


call this marginal evidence that educ is endogenous. The quandary is that the OLS and 2SLS 
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point estimates are quite different. 


b. To test the single over indentifying restriction we obtain the 2SLS residuals: 


qui reg lwage educ exper expersq black south smsa reg661-reg668 smsa66 
(nearc4 nearc2 exper expersq black south smsa reg661-reg668 smsa66) 


predict uhati, resid 


qui reg uthat exper expersq black south smsa reg661-reg668 smsa66 nearc4 
nearc2 


. di e(r2) 
. 00041467 


. di 3010*e(r2) 
1.2481535 


. di chiprob(1, 3010*e(r2) ) 
. 26390545 


The test statistic is the sample size times the R-squared from this regression, or about 1.25. 
The p-value, obtained from yf distribution, is about .264, so the instruments pass the over 
identification test. 

6.2. We first obtain the reduced form residuals, »2; and 122, for educ and JQ, respectively. 


The regression output is suppressed: 
qui reg educ exper tenure married south urban black kww meduc feduc sibs 


predict v21hat, resid 
(213 missing values generated) 


qui reg iq exper tenure married south urban black kww meduc feduc sibs 


predict v22hat, resid 
(213 missing values generated) 


qui reg lwage exper tenure married south urban black educ iq v21hat v22hat 


test v2ihat v22hat 


( 1) v2ihat = 0 
( 2) v22hat = 0 
F( 2, 711) = 4.20 
Prob > F = 0.0153 


The p-value of the joint F test, which is justified asymptotically, is .0153. Therefore, the 
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test finds fairly strong evidence for endogeneity of at least one of educ and JQ, although this 
conclusion relies on the instruments being truly exogenous. If you look back at Problem 5.8, 
this IV solution did not seem to work very well. So we still do not know what should be treated 
as exogenous in this method. 

6.3. a. We need prices to satisfy two requirements. First, calories and protein must be 
partially correlated with prices of food. While this is easy to test separately by estimating the 
two reduced forms, the rank condition could still be violated. (Problem 5.15c contains a 
sufficient condition for the rank condition to hold.) In addition, we must also assume prices are 
exogenous in the productivity equation. Ideally, prices vary because of things like 
transportation costs that are not systematically related to regional variations in individual 
productivity. A potential problem is that prices reflect food quality and that features of the 
food other than calories and protein appear in the disturbance u1. 

b. Since there are two endogenous explanatory variables we need at least two prices. 

c. We would first estimate the two reduced forms for calories and protein by regressing 
each on a constant, exper, exper*, educ, and the M prices, p1, ..., pmWe obtain the residuals, 
ĉa and 29. Then we would run the regression log(produc) on 1,exper, exper’, educ, ¥21, P22 
and do a joint significance test on ¥2; and 22. We could use a standard F test or use a 
heteroskedasticity-robust test. 


6.4.a. Since y = xB + q + vit follows that 
Eix) = xB + E(q|x) + Elx) = xB + x6 = x(B + 8) = xy. 
Since E(y|x) is linear in x there is no functional form misspecification in this conditional 


expectation. Therefore, no functional form test will detect correlation between q and x, no 


matter how strong it is: 6 can be anything. 
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b. Since E(v|x, 7) = 0, Var(v|x, q) = E(v?|x,q) = o? = E(v?|x) = Var(v|x). Therefore, 
Var(y|x) = Var(g + v|x) = Var(g|x) + Var(v|x) + 2E(gv|x), where we use 
Cov(q, vx) = E(qgv|x) because E(1|x) = 0. Now 

E(qv|x) = E[E(vlx, g)|x] = E[gEQ |x, g)|x] = Elg - O|x] = 0. 
Therefore, Var(y|x) = Var(qg|x) + Var(v|x) = 07 + ø$, so that y is conditionally 
homoskedastic. But if E(Q|x) = xy and Var()|x) is constant, a test for heteroskedasticity will 
always have a limiting chi-square distribution. It will have no power for detecting omitted 
variables. 

c. Since E(u?|x) = Var(u|x) + [E(u|x)]? and Var(u|x) is constant, E(u?|x) is constant if and 
only if E[(u|x)] is constant. If E(u|x) + E(u) then E(u|x) is not constant, so [E(u|x)]* 
generally will be a function of x. So E(u?|x) depends on x, which means that u? can be 
correlated with functions of x, say h(x). It follows that regression tests of the form (6.36) can 
be expected, at least in some cases, to detect “heteroskedasticity”. (If the goal is to determine 
when heteroskedasticity-robust inference is called for, the regression-based tests do the right 
thing.) 

6.5. a. For simplicity, absorb the intercept in x, so y = xB + u, E(u|x) = 0, Var(u|x) = o?. 
In these tests, 6* is implicitly SSR/N — there is no degrees of freedom adjustment. (In any case, 
the df adjustment makes no difference asymptotically.) So ú? — G? has a zero sample average, 
which means that 


N N 
N-12 5h; = u,) (a - 67) = N12 ` h;(ú? — ô?). 
i=1 i1 


Next, N”? (hi —,,) = Op(1) by the central limit theorem and 62 — o? = 0,(1). So 
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N”? E` h - u,) (ê? — 0?) = O,(1) + op(1) = 0p(1). Therefore, so far we have 


N N 
N-12 y hi (ai? — ô?) = N2 Sh; — un) (ú? — ê?) + 0,(1). 
i=1 i=1 


We are done with this part if we show 
Nowe (hi — pn)? = N”? (hi — un)'û? + op(1). Now, as in Problem 4.4, we can 
write fi? = u? — 2u;x:(B — B) + [x,(B — B)]°, so 


N N 
N72 (hy -p'a = N" Y h- p)? 


i=1 i=1 


N 
- a| ws Yih - wos [6 - B) 


i=1 


N 
F [a Sh; —p,) (x; ® xi) oe -PÊ -pI 


i=1 
where the expression for the third term follows from 
[x:(B — B)]* = xÊ -PĜ - B)'x; = (x; @ x:) + vec[(B - B)Ê - B)']. Dropping the “-2” the 
second term can be written as (N+ ae u;(h; — TED JSN Ê — P) = op(1) - O,(1) because 
JN Ê - B) = O,(1) and, under E(u;|x;) = 0, E[u;(h; — Ha)'x:] = 0; the law-of-large-numbers 
implies that the sample average is op(1). The third term can be written as 
NYIN YT" (hy — p,)' 8 x1) {vec VN GB - B) JN (B-B)']} =N” 0, (1) + 0,0, 
where we again use the fact that sample averages are O,(1) by the law of large numbers and 
vec[ /N (B — B) /N (Ê — B)'] = O,(1). We have shown that the last two terms in (6.62) are 
o,(1), which proves part a. 

b. By part a, the asymptotic variance of N"? Si h;(ú? — o?) is 


Var[(h; - p,) (u? -0°)] =E[(u? - 07)*(hi - p,)'(h; - p,)]. Now 
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(6.62) 


(u? —o*)* = u} — 2u?0* + o4. Under the null, E(u?|x;) = Var(u;|x;) = o? [since E(u;|x;) = 0 
is assumed] and therefore, when we add (6.37), E[(u? — o7)?|x;] = K? — o4 = n?. A standard 
iterated expectations argument gives 

E[(u? - 0°)? (h; — p,)'(hi-p,)] = ELELU? - 0°)? (h; — p,)' (h: — 1, )])x} = ELELU? - 02)? 
which is what we wanted to show. (Whether we carry out the calculation for a random draw i 
or for random variables representing the population is a matter of taste.) 


c. From part b and Lemma 3.8, the following statistic has an asymptotic LO distribution: 


N N 
[ws La- em, [orela - w)’; - pon] a Dhia? - 6?) | 


Using again the fact that Se —6*) = 0, we can replace h; with h; — h in the two vectors 
forming the quadratic form. Then, again by Lemma 3.8, we can replace the matrix in the 
quadratic form with a consistent estimator, which is 
N 
fw $h; -h)'(h; - h) | 
i=1 
where 7/7 = N-t YG — G*)?. The computable statistic, after simple algebra, can be written 


as 


N N -1 N 
i? (Za - ĉ?)(h; - D) (Zo - h) (h; — D ) (xe - h) (a? - *) 


Now ĝ? is just the total sum of squares of the ú? divided by N. The numerator of the statistic is 
simply the explained sum of squares from the regression ú? on 1,h;, i = 1,..., N. Therefore, 
the test statistic is N times the usual (centered) R-squared from the regression ú? on 


1,hy,i = 1,...,N, or NR2. 
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d. Without assumption (6.37) we need to estimate E[(u? — o°)? (h; —p,) (h; - p,)] 
generally. Hopefully, the approach is by now pretty clear. We replace the population 
expected value with the sample average and replace any unknown parameters — B,o*, and p, 
in this case — with their consistent estimators (under Ho). So a generally consistent estimator 


of Avar(N-1? Bie h, (a2 — 6?)) is 
N -_ — 
N= $ 0? - 6?)?(hi ~ h) (h; ~ h), 
i=1 


and the test statistic robust to heterokurtosis can be written as 


N N -1 
(Ze - ô?)(h; - D ) (Ze - ô?)?(h; — h) (h; - D ) 
i= A i= 
. (xe — h) (a? - 5») 


which is easily seen to be the explained sum of squares from the regression of 1 on 

(ú? — ô?) (h; —h),i = 1,...,N (without an intercept). Since the total sum of squares, without 
demeaning, of unity is simply N, the statistic is equivalent to N — SSRo, where SSRo is the sum 
of squared residuals. 


6.6. Here is my Stata session using the data NLS80.RAW: 
. qui reg lwage exper tenure married south urban black educ 


. predict lwageh 
(option xb assumed; fitted values) 


. gen lwagehsgq = lwageh2 
. predict uhat, resid 
. gen uhatsq = uhat^2 
reg uhatsq lwageh lwagehsq 


Source | SS df MS Number of obs = 935 
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Model 
Residual 


| .288948987 
| 55.3447136 


| 55.6336626 


. 144474493 
-05938274 


. 059564949 


F( 2, 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 


932) 


lwageh 


_cons 


| 3.027285 
| -.2280088 
| -9.901227 


1.880375 
. 1390444 
6.353656 


1.61 
-1.64 
-1.56 


0.108 
0.101 
0.119 


- .6629745 
- .5008853 
-22.37036 


6.717544 
.0448677 
2.567902 


An asymptotically valid test for heteroskedasticity is just the F statistic for joint 


significance of ý and y?, and this yields p — value =. 088 (although this version maitains 


Assumption (6.37) under the null, along with homoskedasticity). Thus, there is only modest 


evidence of heteroskedasticity. It could be ignored or heteroskedasticity-robust standard errors 


and test statistics can be used. 


6.7. a. The simple regression results are: 


use hprice 
reg lprice 
Source 


Model 


ldist if y81 


| 3.86426989 
| 17.5730845 


3.86426989 
.125522032 


Number of obs 
F( 1, 140) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


| 8.047158 


5.55 
12.45 


0.000 
0.000 


. 2348615 
6.769503 


.0657613 
. 6462419 


. 4948889 
9.324813 


This regression suggests a strong link between housing price and distance from the 


incinerator (as distance increases, so does housing price). The elasticity is .365 and the ¢ 


Statistic is 5.55. However, this is not a good causal regression: the incinerator may have been 


put near homes with lower values to begin with. If so, we would expect the positive 


64 


relationship found in the simple regression even if the new incinerator had no effect on 
housing prices. 

b. The parameter 63 should be positive: after the incinerator is built a house should be 
worth relatively more the farther it is from the incinerator. Here is the Stata session: 


gen y81ldist = y81*ldist 


reg lprice y81 ldist y81ldist 


Source | SS df MS Number of obs = 321 
Seppie Vee Seip ret esses tees Pee to eee F( 3, 317) = 69.22 
Model | 24.3172548 3 8.10575159 Prob > F = 0.0000 
Residual | 37.1217306 317 = =.117103251 R-squared = 0.3958 
-------------+------------------------------ Adj R-squared = 0.3901 
Total | 61.4389853 320 .191996829 Root MSE = . 3422 

lprice | Coef Std. Err t P>|t | [95% Conf. Interval 

(ee eke eg ae ee he +--------------------------------------------------------------- 
y81 | -.0113101 . 8050622 -0.01 0.989 -1.59525 1.57263 
ldist | . 316689 .0515323 6.15 0.000 .2153005 .4180775 
y81ldist | .0481862 .0817929 0.59 0.556 - .1127394 . 2091117 
_cons | 8.058468 . 5084358 15.85 0.000 7.058133 9.058803 


The coefficient on /dist reveals the shortcoming of the regression in part a. This coefficient 
measures the relationship between Jprice and /dist in 1978, before the incinerator was even 
being rumored. The effect of the incinerator is given by the coefficient on the interaction, 
y81ldist. While the direction of the effect is as expected, it is not especially large, and it is 
Statistically insignificant, anyway. Therefore, at this point, we cannot reject the null hypothesis 
that building the incinerator had no effect on housing prices. 


c. Adding the variables listed in the problem gives 


reg lprice y81 ldist y81ildist lintst lintstsq larea lland age agesq rooms 


baths 
Source | SS df MS Number of obs = 321 
ee ne rer een F( 11, 309) = 108.04 
Model | 48.7611143 11 4.43282858 Prob > F = 0.0000 
Residual | 12.677871 309 .041028709 R-squared = 0.7937 
-------------+------------------------------ Adj R-squared = 0.7863 
Total | 61.4389853 320 .191996829 Root MSE = .20256 
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lprice | Coef Std. Err t P>|t | [95% Conf. Interval 

ee fan eee pe, Fe ie Sey se +--------------------------------------------------------------- 
y81 | - .229847 .4877198 -0.47 0.638 -1.189519 . 7298249 
ldist | . 0866424 .0517205 1.68 0.095 - .0151265 . 1884113 
y8ildist | .0617759 .0495705 1.25 0.214 - .0357625 . 1593143 
lintst | . 9633332 . 3262647 2.95 0.003 . 3213517 1.605315 
lintstsq | -.0591504 .0187723 -3.15 0.002 - .096088 - .0222128 
larea | . 3548562 0512328 6.93 0.000 . 2540468 -4556655 
lland | . 109999 .0248165 4.43 0.000 . 0611683 . 1588297 
age | -.0073939 0014108 -5.24 0.000 - .0101699 - .0046178 
agesq | 0000315 8.69e-06 3.63 0.000 . 0000144 0000486 
rooms | 0469214 0171015 2.74 0.006 .0132713 0805715 
baths | 0958867 027479 3.49 0.001 .041817 1499564 
cons | 2.305525 1.774032 1.30 0.195 -1.185185 5.796236 


The incinerator effect is now larger (the elasticity is about .062) and the ¢ statistic is larger, 
but the p-value for the interaction term is still fairly large, .214. Against a one-sided 
alternative, the p-value is . 107, so it is almost significant at the 10% level. Still, using these 
two years of data and controlling for the listed factors, the evidence that housing prices were 
adversely affected by the new incinerator is somewhat weak. 


6.8. a. The following is my Stata session: 
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use fertil1 
gen agesq = age^2 
reg kids educ age agesq black east northcen west farm othrural town smcity 
y74-y84 
Source | SS df MS Number of obs = 1129 
SER et aap On he are hea tsi S Sea een F( 17, 1111) = 9.72 
Model | 399.610888 17 23.5065228 Prob > F = 0.0000 
Residual | 2685.89841 1111 2.41755033 R-squared = 0.1295 
-------------+------------------------------ Adj R-squared = 0.1162 
Total | 3085.5093 1128 2.73538059 Root MSE 1.5548 
kids | Coef Std. Err t P>|t | [95% Conf. Interval 
ean a ieee pe, eet ey le +--------------------------------------------------------------- 
educ | -.1284268 .0183486 -7.00 0.000 - .1644286 - .092425 
age | . 5321346 . 1383863 3.85 0.000 . 2606065 . 8036626 
agesq | - .005804 .0015643 -3.71 0.000 - .0088733 - .0027347 
black | 1.075658 .1735356 6.20 0.000 . 7351631 1.416152 
east | .217324 .1327878 1.64 0.102 - .0432192 .41 18672 
northcen | . 363114 . 1208969 3.00 0.003 .125902 .6003261 
west | . 1976032 . 1669134 1.18 0.237 - .1298978 .5251041 
farm | -.0525575 .14719 -0.36 0.721 - .3413592 . 2362443 


othrural | -.1628537 . 175442 -0.93 0.353 - .5070887 . 1813814 
town | . 0843532 .124531 0.68 0.498 - .1599893 . 3286957 
smcity | .2118791 .160296 1.32 0.187 - .1026379 5263961 
y74 | . 2681825 .172716 1.55 0.121 - .0707039 . 6070689 

y76 | -.0973795 . 1790456 -0.54 0.587 - . 448685 .2539261 

y78 | -.0686665 . 1816837 -0.38 0.706 - .4251483 . 2878154 

y80 | -.0713053 .1827707 -0.39 0.697 - .42992 . 2873093 

y82 | -.5224842 .1724361 -3.03 0.003 - .8608214 - . 184147 

y84 | -.5451661 .1745162 -3.12 0.002 - .88 75846 - .2027477 

cons | -7.742457 3.051767 -2.54 0.011 -13.73033 -1.754579 


The estimate says that a women with about eight more years of education has about one 
fewer child (gotten from .128(8) = 1.024), other factors fixed. The coefficient is very 
statistically significant. Also, there has been a notable secular decline in fertility over this 
period: on average, with other factors held fixed, a women in 1984 had about half a child less 
(.545) than a similar woman in 1972, the base year. The effect is also statistically significant 
with p-value = .002. 


b. Estimating the reduced form for educ gives 


reg educ age agesq black east northcen west farm othrural town smcity 
y74-y84 meduc feduc 


Source | SS df MS Number of obs = 1129 
wipecawtens pets sates ceases eset eee ees F( 18, 1110) = 24.82 
Model | 2256.26171 18 125.347873 Prob > F = 0.0000 
Residual | 5606.85432 1110 5.05122011 R-squared = 0.2869 
-------------+------------------------------ Adj R-squared = 0.2754 
Total | 7863.11603 1128 6.97084755 Root MSE = 2.2475 

educ | Coef Std. Err t P>|t | [95% Conf. Interval 

ee aa m Sg” ne Set ce, ees +--------------------------------------------------------------- 
age | -.2243687 . 2000013 -1.12 0.262 -.616792 . 1680546 
agesq | .0025664 .0022605 1.14 0.256 - .001869 .0070018 
black | . 3667819 . 2522869 1.45 0.146 - .1282311 .861795 
east | . 2488042 . 1920135 1.30 0.195 -.1279462 .6255546 
northcen | .0913945 .1757744 0.52 0.603 - .2534931 -4362821 
west | . 1010676 . 2422408 0.42 0.677 - .3742339 .5763691 
farm | -.3792615 . 2143864 -1.77 0.077 -. 7999099 . 0413869 
othrural | - .560814 . 2551196 -2.20 0.028 -1.061385 - .060243 
town | . 0616337 . 1807832 0.34 0.733 - .2930816 .416349 
smcity | . 0806634 . 2317387 0.35 0.728 - .3740319 . 5353588 
y74 | . 0060993 . 249827 0.02 0.981 - . 4840872 . 4962858 
y76 | . 1239104 . 2587922 0.48 0.632 - . 3838667 .6316874 
y78 | .2077861 .2627738 0.79 0.429 - . 3078033 . 7233755 
y80 | . 3828911 . 2642433 1.45 0.148 - .1355816 . 9013638 
y82 | .5820401 . 2492372 2.34 0.020 .0930108 1.071069 
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y84 | . 4250429 . 2529006 1.68 0.093 -.0711741 .92126 
meduc | .1723015 .0221964 7.76 0.000 . 1287499 .2158531 
feduc | . 2074188 .0254604 8.15 0.000 . 1574629 .2573747 
cons | 13.63334 4.396773 3.10 0.002 5.006421 22.26027 
test meduc feduc 
( 1) meduc = 
( 2) feduc = 
F( 2, 1110) = 155.79 
Prob > F = 0.0000 


The joint F test shows that educ is significantly partially correlated with meduc and feduc; 
the ¢ statistics also show this clearly. If we make the test robust to heteroskedasticity of 
unknown form, the F statistic drops to 131.37 but the p-value is still zero to four decimal 
places. 

To test the null that educ is exogenous, we need to reduced form residuals and then include 


them in the OLS regression. I suppress the output here: 
predict v2hat, resid 


reg kids educ age agesq black east northcen west farm othrural town smcity 
y74-y84 v2hat 


Source | SS df MS Number of obs = 1129 
Se ee ee ee ere F( 18, 1110) = 9.21 
Model | 400.802376 18 22.2667987 Prob > F = 0.0000 
Residual | 2684.70692 1110 2.41865489 R-squared = 0.1299 
-------------+------------------------------ Adj R-squared = 0.1158 
Total | 3085.5093 1128 2.73538059 Root MSE = 1.5552 

kids | Coef Std. Err t P>|t | [95% Conf. Interval 

ee a lh et la eae a +--------------------------------------------------------------- 
educ | -.1527395 .0392012 -3.90 0.000 - .2296562 - .0758227 
age | -5235536 . 1389568 3.77 0.000 . 2509059 . 7962013 
agesq | - .005716 . 0015697 -3.64 0.000 - .0087959 - .0026362 
black | 1.072952 .173618 6.18 0.000 . 7322958 1.413609 
east | . 2285554 .1337787 1.71 0.088 - .0339322 -491043 
northcen | . 3744188 .1219925 3.07 0.002 . 1350569 .6137807 
west | . 2076398 . 1675628 1.24 0.216 - .1211357 . 5364153 
farm | -.0770015 .1512869 -0.51 0.611 - .373842 . 2198389 
othrural | -.1952451 . 1814491 -1.08 0.282 -.5512671 .1607769 
town | .08181 .1246122 0.66 0.512 - .162692 . 3263119 
smcity | . 2124996 . 160335 1.33 0.185 - .1020943 .5270936 
y74 | .2121292 .172847 1.57 0.116 - .0670145 .6112729 
y76 | -.0945483 .1791319 -0.53 0.598 - .4460236 . 2569269 
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0572543 
- .053248 
. 4962149 
.5213604 
.0311374 
7.241244 


. 1824512 
. 1846139 
.1764897 
.1778207 
. 0443634 
3.134883 


. 4152424 
. 4154795 
- .842506 
.8702631 
.0559081 


-13.39221 


. 3007337 
. 3089836 
- .1499238 
-.1724578 
.1181829 
-1.09028 


The ¢ statistic on v2hat is .702, so there is little evidence that educ is endogenous in the 


equation. Still, we can see if 2SLS produces very different estimates: 


ivreg kids age agesq black east northcen west farm othrural town smcity 
y74-y84 (educ = meduc feduc) 


Instrumental variables (2SLS) regression 


Source 


Model 


2690.14298 1111 


395.36632 


17 23.2568424 


Number of obs = 
F( 17, 

Prob > F 
R-squared 
Adj R-squared 
Root MSE 


1111) 


east 
northcen 
west 
farm 
othrural 


Instrumented: 
Instruments: 


y74 y76 y78 y80 y82 y84 meduc feduc 


.1527395 
. 5235536 
- .005716 
1.072952 
. 2285554 
. 3744188 
. 2076398 
.0770015 
. 1952451 

.08181 
. 2124996 
.2721292 
.0945483 
-0572543 
- .053248 
. 4962149 
.5213604 
7.241244 


2.42137082 
2. 73538059 
Std. Err. t 
.0392232 -3.89 
. 1390348 3.77 
.0015705 -3.64 
.1737155 6.18 
. 1338537 1.71 
.122061 3.07 
.1676568 1.24 
.1513718 -0.51 
.181551 -1.08 
. 1246821 0.66 
.160425 1.32 
. 172944 1.57 
. 1792324 -0.53 
. 1825536 -0.31 
.1847175 -0.29 
. 1765888 -2.81 
.1779205 -2.93 
3.136642 -2.31 


. 2296993 
. 2507532 
.0087976 

. 732105 
. 0340792 
. 1349228 
.1213199 
. 3740083 
. 5514666 
- .162829 
. 1022706 
.0672045 
. 4462205 
- .415443 
. 4156825 

- .8427 
.8 704586 


-13.39565 


educ 
age agesq black east northcen west farm othrural town smcity 


- .0757796 
. 796354 

- .0026345 
1.4138 
.4911901 
.6139148 
. 5365995 
. 2200053 
.1609764 
. 3264489 
.5272698 
.6114629 
.2571239 
. 3009343 
. 3091865 
- .1497297 
- .1722623 
-1.086834 


The estimated coefficient on educ is larger in magnitude than before, but the test for 


endogeneity shows that we can reasonably attribute the difference between OLS and 2SLS to 


sampling error. 
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c. Since there is little evidence that educ is endogenous, we could just use OLS. I did it 
both ways. First, I just added interactions y74 + educ, y76 + educ, ...,y84 + educ to the model in 
part a and used OLS. Some of the interactions, particularly in the last two years, are 
marginally significant and negative, showing that the effect of education has become stronger 
over time. But the joint F test for the interaction terms yields p — value =.180, and so we do 
not reject the model without the interactions. Still, the possibility that the link between fertility 
and education has become stronger over time is deserves attention, especially using more 
recent data. 

To estimate the full model by 2SLS, I obtained instruments by interacting all year dummies 


with both meduc and feduc. The Stata command is then 


ivreg kids age agesq black east northcen west farm othrural town smcity y74 
(educ y74educ-y84educ = meduc feduc y74meduc-y84feduc ) 


test y74educ y76educ y78educ y80educ y82educ y84educ 
Qualitatively, the results are similar to the OLS estimates. The p — value for the joint F test 
on the interactions is .205 — again, this has asymptotic justification under Assumption 2SLS.3, 
the homoskedasticity assumption — so again there is no strong evidence favoring including of 
the interactions of year dummies and education. 


6.9. a. The Stata results are 
use injury 


reg ldurat afchnge highearn afhigh male married head-construc if ky 


Source | SS df MS Number of obs = 5349 
ee ne nee F( 14, 5334) = 16.37 
Model | 358.441793 14 25.6029852 Prob > F = 0.0000 
Residual | 8341.41206 5334 1.56381928 R-squared = 0.0412 
-------------+------------------------------ Adj R-squared = 0.0387 
Total | 8699.85385 5348 1.62674904 Root MSE = 1.2505 

ldurat | Coef Std. Err. t P>|t | [95% Conf. Interval 

ee a aa a a ce ep ie lh a +--------------------------------------------------------------- 
afchnge | .0106274 .0449167 0.24 0.813 -.0774276 .0986824 
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highearn | 
afhigh | 
male | 
married | 
head | 
neck | 
upextr | 
trunk | 
lowback | 
lowextr | 
occdis | 
manuf | 
construc | 
_cons | 


. 1757598 
. 2308768 
- .0979407 
»1220995 
- .5139003 
. 2699126 
- .178539 
. 1264514 
- .0085967 


.0517462 
.0695248 
. 0445498 
.0391228 
.1292776 
. 1614899 
.1011794 
. 1090163 
.1015267 
. 1023262 


OO0OO0O0O0O00000000O0 


.0743161 
.0945798 
- .1852766 
. 0454027 
- . 7673372 
- .0466737 
- . 376892 
- .0872651 
- .2076305 
- .3208922 


.2772035 
. 3671738 
.0106049 
. 1987962 
. 2604634 
. 5864988 
.0198141 
. 340168 
. 1904371 
. 0803101 


The estimated coefficient on the interaction term is actually higher now — . 231 — than in 


equation (6.54), and it has a large5 ¢ statistic (3.32 compare with 2.78). Adding the other 


explanatory variables only slightly increased the standard error on the interaction term. 


b. The small R-squared, on the order of 4.1%, or 3.9% if we used the adjusted R-squared, 


means that we do not explain much of the variation in time on workers compensation using the 


variables included in the regression. This is often the case in the social sciences: it is very 


difficult to include the multitude of factors that can affect something like durat. The low 


R-squared means that making predictions of log(durat) would be very difficult given the 


factors we have included in the regression: the variation in the unobservables pretty much 


swamps the explained variation. However, the low R-squared does not mean we have a biased 


or inconsistent estimator of the effect of the policy change. Provided the Kentucky policy 


change provides a good natural experiment, the OLS estimator is consistent. With over 5,000 


observations, we can get a reasonably precise estimate of the effect, although the 95% 


confidence interval is pretty wide. 


c. Using the data for Michigan to estimate the basic regression gives 


reg ldurat afchnge highearn afhigh if mi 


Source | 


Model | 


3 11.4616726 


34.3850177 
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Number of obs 


F( 3, 
Prob > F 


1520) 


1524 
6.05 
0.0004 


Residual | 2879.96981 1520 1.89471698 R-squared = 0.0118 
------------- +------------------------------ Adj R-squared = 0.0098 
Total | 2914.35483 1523 1.91356194 Root MSE = 1.3765 

ldurat | Coef Std. Err t P>|t | [95% Conf. Interval 

ie cis ns pea at eA a a ie +--------------------------------------------------------------- 
afchnge | .0973808 .0847879 1.15 0.251 - .0689329 . 2636945 
highearn | . 1691388 . 1055676 1.60 0.109 - .0379348 . 3762124 
afhigh | . 1919906 . 1541699 1.25 0.213 -.1104176 . 4943988 
_cons | 1.412737 .0567172 24.91 0.000 1.301485 1.523989 


The coefficient on the interaction term, .192, is remarkably similar to that for Kentucky. 
Unfortunately, because of the many fewer observations, the ¢ statistic is insignificant at the 
10% level against a one-sided alternative. Asymptotic theory roughly predicts that the standard 
error for Michigan will be about (5,626/1,524)'* ~ 1.92 larger than that for Kentucky 
(assuming the same error variance and same fraction of observations in the different groups). 
In fact, the ratio of standard errors is about 2.23. The difference precision in the KY and MI 
cases shows the importance of a large sample size for this kind of policy analysis. 

6.10. a. As suggested by the hint, we can write /N (Bp — B) = N-!? Ee A 'ziu;, where 
A = E(z’z), plus a term we can ignore by the asymptotic equivalence lemma. Further, 

JN- p) = NY? Ea (x; — u). When we stack these two representations, we see that the 
asymptotic covariance between yN Ê — B) and /N (3 — u) is 

E[A 'ziu;(x; — p)] = A™E[u;z:(x; — p)]. Because E(u;|x;) = 0, the standard iterated 
expectations argument shows that E[w;z;(x; — p)] = 0 because z; is a function of x;. This 
completes the proof. 

b. While the delta method leads to the same place, it is not needed because of linearity of 6 
in the data. We can write @, = B1 + B3xX2 = Bi + B32 + B3(%2 — u2) = G1 + B3(X2 — u2), and 
so JN (G1 - a1) = JN @1— a1) + Bs VN @2 — u2)]. Now 
Bal VN 2 — u2)] = Bal VN (2 — u2)] + op(1) because Ês — B3 = op(1) and 
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VN (X2 — u2) = O,(1). So we have 

VN (@1 - a1) = VN (G1 — a1) + Bal VN (2 — u2)] + 0p(1). 
By part a, we know that /N (f — p) and JN (X2 — u2) are asymptotically jointly normal and 
asymptotically independent (uncorrelated). Because JN (@1 — æ1) is just a deterministic linear 
combination of /N (Ĝĝ — P) it follows that JN (@1 — a1) and JN (%2 — u2) are asymptotically 


uncorrelated. Therefore,. 


Avar[ JN (61 —a@1)] = Avar[/N (a1 - a1)] + B3Avar[ VN (2 - u2)] 
= Avar[ 4N (@1 - a1)] + 6303, 


where o5 = Var(x2). Therefore, by the convention introduced in Section 3.5, we write 
Avar(@1) = Avar(@1) + B3(03/N), 
which is what we wanted to show. 


c. As stated in the hint, the standard error we get from the regression in Problem 4.8d is 


really se(@1), as it does not account for the sampling variation in x2. So 


se(é1) = {[se(éi1)]? + B3(63/N)}"? = {[se(G1)]? + B3[se@2)]?}”” 
since se(¥2) = o7//N. 

d. The standard error reported for the education variable in Problem 4.8d, se(a@1), is about 
.00698, the coefficient on the interaction term (B 3) is about .00455, and the sample standard 
deviation of exper is about 4.375. Plugging these numbers into the formula from part c gives 
se(@1) = [(.00698)? + (.00455)?(4.375)?/935]!” ~.00701. For practical purposes, this is not 
much bigger than .00698: the effect of accounting for estimation of the population mean of 
exper is very modest. 


6.11. The following is Stata output for answering the first three parts: 
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use cps78_85 


reg lwage y85 educ y85educ exper expersq union female y85fem 


Source | SS df MS Number of obs = 1084 
ee F( 8, 1075) = 99.80 
Model | 135.992074 8 16.9990092 Prob > F = 0.0000 
Residual | 183.099094 1075 .170324738 R-squared = 0.4262 
-------------+------------------------------ Adj R-squared = 0.4219 
Total | 319.091167 1083 . 29463635 Root MSE = 4127 

lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

eee es ee See ag ne. eet a +--------------------------------------------------------------- 
y85 | . 1178062 .1237817 0.95 0.341 -.125075 . 3606874 
educ | .0747209 . 0066764 11.19 0.000 . 0616206 .0878212 
y85educ | .0184605 . 0093542 1.97 0.049 . 000106 .036815 
exper | . 0295843 . 0035673 8.29 0.000 .0225846 . 036584 
expersq | -.0003994 .0000775 -5.15 0.000 - .0005516 - .0002473 
union | . 2021319 .0302945 6.67 0.000 . 1426888 . 2615749 
female | -.3167086 .0366215 -8.65 0.000 - . 3885663 - . 244851 
y85fem | .085052 .051309 1.66 0.098 - .0156251 .185729 
cons | . 4589329 . 0934485 4.91 0.000 .2155707 .642295 


a. The return to another year of education increased by about .0185, or 1.85 percentage 
points, between 1978 and 1985. The ¢ statistic on y85educ is 1.97, which is marginally 
significant at the 5% level against a two-sided alternative. 

b. The coefficient on y85fem is positive and shows that the estimated gender gap declined 
by about 8.5 percentage points. It is still very large, with the gender difference in /wage in 
1985 estimated at about —. 232. The ¢ statistic on y85fem is only significant at about the 10% 
level against a two-sided alternative. Still, this is suggestive of some closing of wage 
differentials between women and men at given levels of education and workforce experience. 

c. Only the coefficient on y85 changes if wages are measured in 1978 dollars. In fact, you 
can check that when 1978 wages are used, the coefficient on y85 becomes about 
—. 383 =. 118 — log(1.65) ~.118 —. 501. 

d. To answer this question, I just took the squared OLS residuals and regressed those on the 


year dummy, y85. The coefficient is about .042 with a standard error of about .022, which 
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gives a ¢ statistic of about 1.91. So there is some evidence that the variance of the unexplained 
part of log wages (or even log real wages) has increased over time. 

e. As the equation is written in the problem, the coefficient ôo is the growth in nominal 
wages for a male with no years of education! For a male with 12 years of education, we want 
60 = 60 + 1261. 

Many packages have simple commands that deliver standard errors and tests for linear 
combinations. But a general way to obtain the standard error for ĝo = ôo +126; is to replace 
y85 + educ with y85 + (educ — 12) and reestimate the equation. Simple algebra shows that, in 


the new equation, 0o is the coefficient on educ. In Stata we have 
gen y85educ_12 = y85*(educ - 12) 


reg lwage y85 educ y85educ_12 exper expersq union female y85fem 


Source | SS df MS Number of obs = 1084 

Jena mes ese ue Sse Soe en eee eee See teen F( 8, 1075) = 99.80 
Model | 135.992074 8 16.9990092 Prob > F = 0.0000 
Residual | 183.099094 1075 .170324738 R-squared = 0.4262 
-------------+------------------------------ Adj R-squared = 0.4219 
Total | 319.091167 1083 . 29463635 Root MSE = . 4127 

lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

Sen Se a vias me! peel, elie a Sey +--------------------------------------------------------------- 
y85 | . 3393326 . 0340099 9.98 0.000 .2125993 . 4060659 
educ | .0747209 . 0066764 11.19 0.000 . 0616206 .0878212 
y85educ_12 | .0184605 . 0093542 1.97 0.049 . 000106 .036815 
exper | .0295843 . 0035673 8.29 0.000 .0225846 . 036584 
expersq | -.0003994 .0000775 -5.15 0.000 - .0005516 - .0002473 
union | . 2021319 .0302945 6.67 0.000 . 1426888 . 2615749 
female | -.3167086 .0366215 -8.65 0.000 - . 3885663 - .244851 
y85fem | .085052 .051309 1.66 0.098 - .0156251 . 185729 
cons | . 4589329 . 0934485 4.91 0.000 .2155707 .642295 


So the growth in nominal wages for a man with educ = 12 is about . 339, or 33.9%. [We 
could use the more accurate estimate, . 404, obtained from exp(.339) — 1 =.404.] The 95% 
confidence interval goes from about 27.3 to 40.6. 


Stata users can verify that the command 
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. lincom y85 + 12*y85educ 

after estimation of the original equation delivers the same estimate and inference. 

6.12. Under the assumptions listed, E(x'u) = 0,E(z u) = 0, and the rank conditions hold 
for OLS and 2SLS, so we can write 


N 
IN Boss z B) ~ Ay Ge `> xn + o,(1), 


i=1 


N 
YN Bors ~B) = At Ga 2. x ) + op(1) 
i=1 
where A = E(x;x;), Ay = E(x? x7), and x} = z,II. Further, because of the homoskedasticity 
assumptions, E(u?x}x;) = o7A, E(u?x?*'x;) = o7Ax, and E(u?x*"x;) = o*E(x?’x;). But we 


know from Chapter 5 that E(x?’x;) = Ax. Next, we can stack the above equations to obtain 


that OLS and 2SLS, when appropriately centered and scaled, are jointly asymptotically normal 


vi C 
cv S/S 


where V; = Avar[/N (B55 — B)], V2 = Avar[ VN (B,,, — B)], and 


with variance-covariance matrix 


C = A;'E(u?x?'x;)A = o?A™. Therefore, we can write the asymptotic variance matrix of 


-1 4-1 
e( AA ) 
At? A J 


Now, the asymptotic variance of any linear combination is easy to obtain. In particular, the 


both estimators as 


asymptotic variance of /N (B,.,; — B) — JN (Îozs — B) is simply 


o? (A7! + A™! — A7 — A™}) = o? A; — o? A, which is the difference in the asymptotic 
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variances, aS we wanted to show. 
6.13. This is a simple application of the law of iterated expectations. The statement of the 


problem should add the requirement pı + 0. By the LIE, 
E(ui|z) = ELE(i|z, v2)\z] = E(piva|z) = piEQ2|z) 


and so if E(u;|z) = 0 then E(v2|z) = 0, too. 


6.14. a. First, y2 is a function of (z, v2), and so, from the structural equation, 
Elz, v2) = 2161 + g(v2)o1 + E(wi|z, v2) = 2181 + g(v2)o1 + E(ui|v2), 
where 
E(u1lz, v2) = E(ui|v2) 
follows because (u1, v2) is independent of z. (Note that, in general, it is not enough to assume 


that uı and v2 are separately independent of z; joint independence is needed.) 


b. If E(u1|v2) = piv2 then, under the previous assumptions, 


Elz, v2) = 2181 + g(v2)a1 + pve. 
Therefore, in the first step, we would run OLS of yz on z;,i = 1,...,N, and obtain the OLS 
residuals, 2. In the second step, we would regress yi on Za, 8(vi2), Vi2,i = 1,...,N. By the 
usual two-step estimation results, all coefficients are /N -consistent and asymptotically normal 
for the corresponding population parameter. The interesting thing about this method is that, if 
G, > 1 we have more than one endogenous explanatory variable — g1 (y1), ... , Zc, 2) — but 
adding a single regressor, ¥2, cleans up the endogeneity. This occurs because all endogenous 
regressors are a function of y2, and we have assumed yp is an additive function of z and an 


independent error, which pretty much restricts y2 to be continuous. (We can easily replace the 
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linear function zm2 with known nonlinear functions of z.) 
As specific examples, the second stage regression might be 
Va OF Za, Y2 Yoy, Pn, i= 1,...N 
or 
ya or Za, 1[ai < yz < a2], ..., 1[am-1 < ya < am], 1[yi2 > am], v,i = 1,...,N. 

In the latter case, dummies for whether y; falls into one of the intervals 
(—00, a1], (a1, 42], ..., (am-1,a m], (au, ©) appear in the structural model. 

c. If pı = 0, no adjustment is needed to the asymptotic variance, so we can use the usual t 
statistic on Ŷ» as a test of endogeneity of y2, where the null is exogeneity: Ho : pi = 0. 
Actually, nothing guarantees that Var(y1|z, v2) does not depend on v2 — and, under weaker 
assumptions, it could also depend on z — so there is a good case for making the test robust to 
heteroskedasticity. 


d. The estimating equation becomes 


E(v1|zZ, v2) = Z161 + g(v2)01 + Piv2 + E1(v5 — 15) 
and now, to implement a two-step control function procedure, we obtain 73, the usual OLS 
error variance estimate, along with 72. The residuals are constructed as before, 


A 


Vi2 = Vi2 — Zift. The second-step regression is now 
ya on Za, gy2), V2, (0% — 25), i = 1,...,N 
Now we can use a heteroskedasticity-robust Wald test of joint significance of Ŷ; and 
($2 — 75). Under the null Ho : pi = 0,€; = 0, we do not have to adjust the statistic for the 


first-step estimation. 


e. We would use traditional 2SLS, where we need at least one IV for each g,(v2). Methods 
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for coming up with such IVs are discussed in Section 9.5. Briefly, they will be nonlinear 
functions of z, which is why E(u;|z) = 0 should be assumed. Generally, we add enough 
nonlinear functions, say h(z), to the original instrument list z. So, do 2SLS of yı on z1,g, 
using IVs [z, h(z)]. 2SLS will be more robust than the method described in part b because the 
reduced form for yz is not restricted in any way, and we need not assume u1 is independent of 
Z. 

6.15. a. Because y2 = zm2 + v2, we can find E(y1|z, v2) or E(y1ı|z, y2); they are the same. 


Now 


EQılz, v2) = 2161 + g(Z1,.¥2)01 + 8(Z1,¥2)E(vi|Z, v2) + E(u1|z, v2) 
= 7161+ 2(Z1,2)01 + &(Z1,V2)v284 + P1V2 


b. The first step is to regress y; on z; and get the residuals, »;2. Second, run the regression 
ya ON Za, B(Zia,Vi2), BZ, Vi2)Vi2, Vi2 
which means that Ŷ;2 appears by itself and interacted with all elements of g(zi1,Vi2). 
c. The null is Ho : 8; = 0, pı = 0, which means we can compute a 


heteroskedasticity-robust Wald test of joint significance of g(zi1,yi2)Vi2 and Vj2. 


d. For the specific model give, the second-step regression is 
Yi ON Za, V2, Yp, Zayi, Yoda, Pz, i = 1,...,N. 


In other words, v2 appears by itself and interacted with y;2, as in Garen (1984). 
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Solutions to Chapter 7 Problems 


7.1. Write (with probability approaching one) 


B= B+ G Zx ) G Exu) 


From Assumption SOLS. 2, the weak law of large numbers, and Slutsky’s Theorem, 


i=1 


N -1 
in yx, =A", 


Further, under SOLS.1, the WLLN implies that plim (N= y Xiu; ) = 0. Thus, 
N ml N 
plim(B) = B + ain Exx ) . in Exi ) =B+A*.0=B. 
i=1 i=1 
7.2. a. Under SOLS. 1 and SOLS.2, Theorem 7.2 implies that Avar(B ors) = A BAN, 
where A = E(X}X;) and B = E(X;u,u;X;). But we have assumed that 
E(X;u;u;X;) = E(X}QX;), which proves the assertion. Effectively, this is what we can expect 
for the asymptotic variance of OLS under the system version of homoskedasticity. [Note that 
Assumption SGLS. 3 and E(X;u;u}X;) = E(XiQX;) are not the same, but both are implied by 
condition (7.53). There are other cases where they reduce to the same assumption, such as in a 
SUR model when Q is diagonal. ] 
b. The estimator in (7.28) is always valid. An estimator that uses the structure of 
Avar (Ê oie) obtained in part a is obtained as follows. Let Q = N- ee û;û;, where the û;are 
the G x 1 system OLS residuals. Then 


Avar(B sors) = (Zx) (Zxax) (sxx) 


i=1 
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is a valid estimator provided the homoskedasticity assumption holds. 


c. Using the hint and dropping the division by N on the right hand side, we have 
[Avar(B jgrs)}t - [AvarB sozs)]t = EXIQ”X:) - EXX EX; QX] E(X;X:). 
Define Z; = Q!?X; and W; = Q"?X;. Then the difference can be written as 
E(Z;Z:) — E(Z:W;)[E(W;W;)] "E(W;Z;). 
Now, define R; = Z; — WII, where I = [E(W;W,)]"!E(W;Z,); R; is the G x K matrix of 


population residuals from the linear projection of Z; on W;. Straightforward multiplication 
shows that 

E(Z/Z;) — E(Z;Wi)[E(W;W;)] 'E(W;Z;) = E(R{Ri), 
which is necessarily positive semi-definite. We have shown that if (7.53) holds along with 
SGLS.1 and the rank conditions for SGLS and SOLS, then FGLS is more efficient than OLS. 

d. If Q = o7Ig, 

Avarl VN Ĝsors-B)] = EXX) “E(X/QX,)[E(X/X,)] = o?[E(X/X,)] and 
Avar[/N ÊĜsors- B)] = EX;Q7X)]! = [EXI X)] = [EXX]. 

e. This statement is true provided we consider only asymptotic efficiency under the 
assumption that SGLS.1 holds. In other words, under SGLS.1, the standard rank conditions, 
and E(u,u;|X;) = Q, there is nothing to lose asymptotically by using FGLS. Of course, SOLS 
is more robust in that it only requires SOLS.1 for consistency (and asymptotic normality). 
Small sample properties are another issue because it is difficult to characterize the exact 
properties of FGLS under general conditions. 

7.3. a. Since OLS equation-by-equation is the same as GLS when Q is diagonal, it suffices 


to show that the GLS estimators for different equations are asymptotically uncorrelated. This 
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follows if the asymptotic variance matrix is block diagonal (see Section 3.5), where the 
blocking is by the parameter vector for each equation. To establish block diagonality, we use 


the result from Theorem 7.4: under SGLS. 1, SGLS.2, and SGLS.3, 


Ava[ VN (B-B)] = EXQ X). 
Now, we can use the special form of X; for SUR (see Example 7.1), the fact that Q is 
diagonal, and SGLS.3. In the SUR model with diagonal Q, SGLS.3 implies that 


E(uZ,XigXig) = OZE(Xj,Xig) for all g = 1,...,G, and 
E(uigtt inXigXin) a E(wigtin)E(XjgXin) =0,allg#h. 


Therefore, we have 


o°E(x!xi1) 0 0 
E(X;Q'X;) = 0 E 0 
0 0 očE(xlcXic) 


When this matrix is inverted, it is also block diagonal. This shows that Avar[ /N Ê — B)] is 
block diagonal, and therefore the /N Ê, -P 2) are asymptotically uncorrelated. 

b. To test any linear hypothesis, we can either construct the Wald Statistic or we can use 
the weighted sum of squared residuals form of the statistic as in (7.56) or (7.57). For the 
restricted SSR we must estimate the model with the restriction B, = B, imposed. See Problem 
7.6 for one way to impose general linear restrictions. 

c. Actually, for the conclusion to hold about asymptotic equivalence, we need to assume 
SGLS.1 along with SOLS.2 and SGLS.2. When Q is diagonal in a SUR system, system OLS 
and GLS are the same. Under SGLS.1 and SGLS.2, GLS and FGLS are asymptotically 


equivalent (regardless of the structure of Q) whether or not SGLS.3 holds. Now if 
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Pors = Bor and yN Ca z Bax) = op(1), then yN (Bsors = Bae) = 0,(1). Thus, when 
Q is diagonal, OLS and FGLS are asymptotically equivalent under the exogeneity assumption 
SGLS.1, even if Q is estimated in an unrestricted fashion and even if the system 
homoskedasticity assumption SGLS.3 does not hold. 

If only SOLS.1 holds, we cannot conclude yN (Bie = Bas) = 0,(1), and so 
JN (Ê SOT ĝ raa) is not generally o,(1). It is true that FGLS is still consistent under 


SOLS.1 because its plim is 


or E(xi,xi) 0 0 or E(xj ui) 
0 `, 0 
0 0 ogE(x!cXic) o GE(Xiguia) 


and E(xjguig) = 0, g = 1,...,G. 
7.4. To make the notation align with the text, use B to denote the SOLS estimator, and let 
ù; denote the G x 1 vector of SOLS residuals that are used in obtaining Q. Then it suffices to 


show that 


N N 
N! X att; = NY? Susu; + o1), (7.82) 
i=1 i=1 
and this follows if, when we sum across N and divide by JN, the last three terms in (7.42) are 


op(1). Since the third term is the transpose of the second it suffices to consider only the second 


and fourth terms. Now 


N-12 Devec[ ui(B = B) x; | = N2 Da @u;)° (É = B) 


N 
= [p Zag 0) WG-8) = 09(1) + Op(1) = op(1). 


i=1 
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N-12 Lived x.(B = B) (ê = p) x; | . [p La @ Xi) plore = B) JN (B z B) Y/N 
= O,(1) + Op(1) +N"? = op(1). 


Together, these imply N+"? eS ai; = N”? Pe uju; + 0,(1) and so 


és N 
N-12 D aà, -Q) = N2 Suu; - Q) +0,(1). 
i=1 


i=1 


7.5. This is easy with the hint. Note that 


Therefore, 


N -1 2 X; yi N -1 2, xvi 
Ê = (è ® (x) \@ ® Ix) : = (i 8 (Sx) ) : 
i=1 i=1 


N a N 1 
ae XiViG 2 XViG 


N -1 {v 
(Sa xixi) 0 a j Xa xvi B, 
0 On: xix) "0 yy Xy B, 
0 s 0 : 
2 N 
0 aoe 0 om xix) 1 baa XiViG Ba 
where B g İS the OLS estimator for equation g. 
7.6. The model for a random draw from the population is y; = X;B + u;, which can be 


written as 


Y; = XaB, + Xz, +u; 
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where the partition of X; is defined in the problem. Now, if Bı = Rī! (r — R2ß2), we just plug 


this into the previous equation: 


Yy; = XaB, + X2B, +u; = XaRj'(r— RoP,) + X2ß, +u; 
= XaRīitr + (Xj2 = XaR2)ß, + Uj. 


Bringing X;,Rj'r to the left hand side gives 
yı- XaRj'r = (X, - XaR2)B, + wi. 
If we define Ẹ; = y, — XiRj'r and X2 = Xp — Xi1R2, then we get the desired equation: 
y,= Xi, + Uj. 
(Note that ¥, and Ñ» are functions of the data for observation i and the known matrices 
Rı, Ro, and the known vector r.) 

This general result is very convenient for computing the weighted SSR form of the F 
statistic (under SGL.3). Let Ô denote the estimate of Q based on estimation of the 
unconstrained system; typically, Q = N-! Se ti;u1; where ti; are the system OLS residuals. 
Using this matrix, we estimate y, = Xap, + Xi2B, + u; and then y, = Xi, + u; by FGLS 
using Ô. Let û; denote the FGLS residuals from the unrestricted model and let ŭ; = ï,- XioB, 
denote the restricted FGLS residuals, where B » is the FGLS estimator from the restricted 
estimation. Then the F statistic computed from (7.57) has an approximate §o,vc-x distribution 
under Ho (assuming SGLS.1, SGLS.2, and SGLS.3 hold). 

7.7. a. First, the diagonal elements of Q are easily found because 
E(uz) =E[E(u?|xi)] = o? by iterated expectations. Now, consider E(ujuis), and take s < t 
without loss of generality. Under E(u;|Xit, Uit-1,...) = 0, E(uiluis) = 0 because uis is a subset 


of the larger conditioning set. Applying LIE again we have 


85 


E(Uittis) = E[E(uisluis)] = EL[E(uuluis)uis)] = 0. 


So 
c? 0 0 
0 2 0 
Q= K 
Oo > O 
0 0 o 


b. The GLS estimator is 


p* 


Ill 
KS 
M> 

x 

9 

z 
— 
— 
M> 

Le 

s 

< 
NN. 


N T I/N T 
= e D cs ) @ D oxi ) 


which is a weighted least squares estimator with every observation for time period ¢ weighted 
by 07°, the inverse of the variance. 

c. If, say, ya = Bo + Bivie-r + ui, then yi is clearly correlated with uj, which says that 
Xin1 = Vir is correlated with u; Thus, SGLS.1 cannot hold. Generally, SGLS.1 fails to hold 
whenever there is feedback from yx to xis,s > t. Nevertheless, because Q! is diagonal, 
xia tu; = ae x),072uj1, and so 

T 


E(X;Qu;) - > o7 E(X;ui) = 0, 


t=1 


where we use E(x},uj;) = 0 under E(wi|Xi,Uiz1,-..) = 0. It follows that the GLS estimator is 


GLS is consistent in this case without SGLS.1. 


ee effec ae ongi 
d. First, since Q™ is diagonal, X}Q7! = (07xi, 03 Xh, ..., 07 Xir) , and so 
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T vi 
E(X}Q‘*ujuwjQ'X;) = ye, > Oor 0z E(UiisX Xis). 


t=1 s=1 
First consider the terms for s + t. Under E(ui|Xit, Uit-1,...) = 0, E(ui|Xi, Uis, Xis) = 0 for 
s < t, and so by the LIE, E(UirUisX4Xis) = 0, all ¢ + s. Next, for each ¢, 


E(ujxiXit) = E[E(u7X;XulXi)] = ELE (wi|Xit)XiXir) | 
= E(orx, Xu) = oF R(X Xe), t = 1,2,...,T. 


It follows that 


T 
EXO; Q”X;) = $ oP E(x) xu) = E(X/Q'X)). 
t=1 


e. First, run pooled OLS across all i and ¢ and let č; denote the pooled OLS residuals. 


Then, for each ¢, define 


N 
6 = NI iB 
i=1 


(We might replace N with N — K as a degree-of-freedom adjustment.) By standard arguments, 
6? Z 62 as N > œ. 

f. What we need to show is that replacing the o? with the G? does not affect the 
JN -asymptotic distribution of the FGLS estimator. We know this generally under SGLS.1, but 


we have relaxed that assumption. To show it holds in the current setting we need to show 


N T N T 
k es 1 = = 1 
N! > > Gx xy = N! ` ` O7°X/Xir + Op(1) 


i=l 1 i=l 1 
N T N vä 
—1/2 A2! Z —1/2 =2y! 
N > i > jor Xin = N > > (Ox XjUit + Op(1). 
i=l 1 i=l 1 


The first follows from the consistency of each G? using standard arguments we have used 
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before. The second requirement follows from 


N T N T T N 
-1/2 AOI 1/2 BPG a: 1/2 l a) -2 
N ` ` O7°X;Uin —N > ` OF X glit = ` ` X,Uin (674 — 07°) 
i=1 1 i=1 f l i=1 
T 


o,(1) 


= ze (1) + op(1) 


because N"? ee X/,Uj, satisfies the CLT under Under E (ui|X i, Uim1,...) = 0 and second 


moment assumptions. 


So now we know all inference is as if we are applying pooled OLS to 
Vilot) = (xi/o)B +e, ¢=1,2,...,T 


where this equation satisfies POLS.1, POLS.2, and POLS.3. Thus, we can use the usual 


statistics — standard errors, confidence intervals, ¢ and F statistics — from the regression 
(Yuli) = (Xili), t= 1,..., 757 = 1,...,N. 


For F testing, note that the G7 should be obtained using the pooled OLS residuals for the 
unrestricted model. 

g. If o? = o? forall t = 1,...,7, inference is very easy because with weighted least 
squares method reduces to pooled OLS. Thus, we can use the standard errors and test statistics 
reported by a standard OLS regression pooled across i and t. 


7.8. Here is some Stata output: 
use fringe 

. gen hrvac = vacdays/annhrs 

. gen hrsick = sicklve/annhrs 
gen hrins = insur/annhrs 

. gen hrpens = pension/annhrs 


. sureg (hrearn hrvac hrsick hrins hrpens = educ exper expersq tenure 
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tenuresq union south nrtheast nrthcen married white male), corr 


Seemingly unrelated regression 


hrearn 
hrvac 
hrsick 
hrins 
hrpens 


educ 
exper 
expersq 
tenure 
tenuresq 
union 
south 
nrtheast 
nrthcen 
married 
white 


educ 
exper 
expersq 
tenure 
tenuresq 
union 
south 
nrtheast 
nrthcen 
married 
white 


educ 
exper 
expersq 
tenure 
tenuresq 
union 
south 
nrtheast 
nrthcen 
married 
white 
male 


. 4588139 
- .0758428 
. 0039945 
. 1100846 
- .0050706 
. 8079933 
- .4566222 
-1.150759 
- .6362663 
. 6423882 
1.140891 
1.784702 
-2.632127 


. 068393 
.0567371 
.0011655 
.0829207 
. 0032422 
. 4034789 
. 5458508 
. 5993283 
.5501462 


. 3247662 
- .1870455 
.0017102 
- .052437 
- .0114252 
.0171892 
-1.52647 
-2.32542 
-1.714533 
-.167795 
- .0457639 
1.012897 
-5.014054 


.5928617 
. 0353598 
.0062787 
.2726062 
.0012839 
1.598797 
.6132258 
.0239032 
. 4420005 
1.452571 
2.327546 
2.556507 
- .2501997 


0201829 
. 0066493 
- .0001492 
012386 
- .0002155 
0637464 
- .0179005 
- .0169824 
.0002511 
.0227586 
. 0084869 
.0569525 
- .1842348 


.0022061 
.0018301 
. 0000376 
. 0026747 
. 0001046 
0130148 
.0176072 
0193322 
.0177458 
. 0133337 
.0195296 
.0127021 
.039201 


.015859 
. 0030623 
- .0002229 
.0071436 
- .0004205 
. 0382378 
- .05241 
- .0548728 
- .03453 
- .0033751 
- .0297905 
0320568 
- .2610674 


0245068 
0102363 
- .0000755 
0176284 
- .0000106 
0892549 
. 016609 
.0209081 
0350321 
. 0488923 
.0467642 
.0818482 
- . 1074022 


.0096054 
.002145 
- .0000383 
.0050021 
- .0001391 
- .0046655 
- .011942 
- .0026651 
- .0222014 
. 0038338 
. 0038635 
.0042538 


. 0009035 
. 0007495 
. 0000154 
.0010954 
0000428 
. 0053303 
.0072111 
.0079176 
.0072679 
.0054609 
.0079984 
.0052022 
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Geooo00qg00000 


. 0078346 
.0006759 
- .0000684 
.002855 

- .0002231 
- .0151127 
- .0260755 
- .0181833 
- .0364462 
- .0068694 
- .0118132 
- .0059423 


0113763 
.0036141 
-8.08e-06 
.0071491 
- .0000552 
.0057816 
. 0021916 
0128531 
- .0079567 
.014537 
.0195401 
.01445 


1252278 


0622935 


. 0031082 
. 0013436 


- .00021 


.0057618 
. 0004787 


.11527 


0193969 
.0481601 
0151315 


0129002 
. 0094668 
. 0000431 
0176338 
. 0000146 
. 1730372 
.0587541 
.0376474 
. 0636345 


.031244 


.0019262 
.0002921 
.0149449 
.0009284 
.1162513 
.0751632 
. 1004755 
. 1033878 
.0521892 
.0292758 
.0504592 
. 6310534 


.0468012 
.0148321 
- .000027 
. 0338067 
.0001909 
. 2080296 

.049 
.0358521 
.0217525 
.0418381 
. 1084437 
. 1400325 
. 3546143 


cons | -.0937606 .016055 -5.84 
E e + 
hrins | 
educ | . 0080042 .002498 3.20 
exper | .0054052 .0020723 2.61 
expersq | -.0001266 .0000426 -2.97 
tenure | .0116978 . 0030286 3.86 
tenuresq | -.0002466 . 0001184 -2.08 
union | . 1441536 .0147368 9.78 
south | .0196786 . 0199368 0.99 
nrtheast | -.0052563 .0218901 -0.24 
nrthcen | -0242515 0200937 1.21 
married | 0365441 0150979 2.42 
white | 0378883 0221136 1.71 
male | 1120058 0143827 7.79 
cons | -.1180824 0443877 -2.66 
Ga Di d ee a eee ae + 
hrpens | 
educ | - 0390226 . 0039687 9.83 
exper | .0083791 0032924 2.55 
expersq | -.0001595 .0000676 -2.36 
tenure | .0243758 .0048118 5.07 
tenuresq | -.0005597 .0001881 -2.97 
union | . 1621404 . 0234133 6.93 
south | -.0130816 .0316749 -0.41 
nrtheast | -.0323117 .0347781 -0.93 
nrthcen | -.0408177 .0319241 -1.28 
married | -.0051755 -023987 -0.22 
white | .0395839 .0351332 1.13 
male | .0952459 .0228508 4.17 
cons | -.4928338 .0705215 -6.99 
Correlation matrix of residuals: 
hrearn hrvac hrsick hrins hrpens 
hrearn 1.0000 
hrvac 0.2719 1.0000 
hrsick 0.2541 0.5762 1.0000 
hrins 0.2609 0.6701 0.2922 1.0000 
hrpens 0.2786 0.7070 0.4569 0.6345 1.0000 
Breusch-Pagan test of independence: chi2(10) = 


test married 


1) [hrearn]married = 0 
2) [hrvac]married = 0 
[hrsick]married = 0 
4) [hrins]married = 0 
5) [hrpens]married = 0 


NNN 
w 
pass 


chi2( 5) = 14.48 
Prob > chi2 = 0.0128 
lincom [hrpens]educ - [hrins]educ 


(1) - 


[hrins]educ + [hrpens]educ = 0 
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1393.265, Pr = 


0.0000 


The first test shows that there is some evidence that marital status affects at least one of the 
five forms of compensation. In fact, it has the largest economic effect on hourly earnings: 
.642, but its ¢ statistic is only about 1.54. The most statistically significant effect is on hrins: 
.037 with ¢ = 2.42. It is marginally significant and positive for hrvac as well. 

The Lincom command tests whether another year of education has the same effect on 
hrpens and hrins. The t statistic is 10.11 and the p-value is effectively zero. The estimate in the 
hrpens equation (with standard error) is .039 (.004) while the estimate in the Arins equation is 
.008 (.003). Thus, each is positive and statistically significant, and they are significantly 
different from one another. 

All of the standard errors and statistics reported above assume that SGLS.3 holds, so that 
there can be no system heteroskedasticity. This is unlikely to hold in this example. 

7.9. The Stata session follows, including a test for serial correlation before computing the 


fully robust standard errors: 
use jtraini 
. xtset fcode year 
panel variable: fcode (strongly balanced) 
time variable: year, 1987 to 1989 
delta: 1 unit 


reg lscrap d89 grant grant_1 lscrap_1 if year != 1987 


Source | SS df MS Number of obs = 108 
ee nnn F( 4, 103) = 153.67 
Model | 186.376973 4 46.5942432 Prob > F = 0.0000 
Residual | 31.2296502 103 .303200488 R-squared = 0.8565 
-------------+------------------------------ Adj R-squared = 0.8509 
Total | 217.606623 107 2.03370676 Root MSE = 55064 
lscrap | Coef Std. Err. t P>|t | [95% Conf. Interval 

ec eS dls es es +--------------------------------------------------------------- 


d89 | -.1153893 .1199127 -0.96 0.338 - . 3532078 . 1224292 
grant | -.1723924 . 1257443 -1.37 0.173 - .4217765 .0769918 
grant_1 | -.1073226 . 1610378 -0.67 0.507 - .426703 2120579 
lscrap_1 | . 8808216 .0357963 24.61 0.000 . 809828 .9518152 
_cons | -.0371354 . 0883283 -0.42 0.675 - .2123137 . 138043 


The estimated effect of grant, and its lag, are now the expected sign (if we think the job 
training program should reduce the scrap rate), but neither is strongly statistically significant. 
The variable grant would be if we use a 10% significance level and a one-sided test. The 
results are certainly different from when we omit the lag of log(scrap). 


Now test for AR(1) serial correlation: 


gen uhat_1 = l.uhat 
(417 missing values generated) 


reg lscrap grant grant_1 lscrap_1 uhat_1 if d89 


Source | SS df MS Number of obs = 54 
E E eOaalen eA eee pose eames aces F( 4, 49) = 73.47 
Model | 94.4746525 4 23.6186631 Prob > F = 0.0000 
Residual | 15.7530202 49 .321490208 R-squared = 0.8571 
-------------+------------------------------ Adj R-squared = 0.8454 
Total | 110.227673 53 2.07976741 Root MSE = .567 
lscrap | Coef Std. Err t P>|t | [95% Conf. Interval 
a se ec ns aa ee ke ee +--------------------------------------------------------------- 
grant | .0165089 .215732 0.08 0.939 - .4170208 . 4500385 
grant_1 | -.0276544 .1746251 -0.16 0.875 - .3785767 . 3232679 
lscrap_1 | - 9204706 .0571831 16.10 0.000 . 8055569 1.035384 
uhat_1 | . 21790328 .1576739 1.77 0.083 - .0378247 . 5958904 
cons | - .232525 . 1146314 -2.03 0.048 - .4628854 - .0021646 


The estimate of p is about . 28, and it is marginally significant with t = 1.77. (Note we are 
relying on asymptotics with M = 54.) One could probably make a case for ignoring the serial 
correlation. But it is easy enough to obtain the serial-correlation and heteroskedasticity-robust 


standard errors: 


reg lscrap d89 grant grant_1 lscrap_1 if year != 1987, robust cluster(fcode 


Linear regression Number of obs = 108 
F( 4, 53) = 77.24 
Prob > F = 0.0000 
R-squared = 0.8565 
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grant_1 
lscrap_1 
_cons 


Root MSE 


= .55064 


adjusted for 54 clusters in fcode 


- .1153893 
- .1723924 
- .1073226 

. 8808216 
- .0371354 


(Std. Err 
Robust 
Std. Err. 
. 1145118 -1. 
. 1188807 -1. 
.1790052 -0. 
. 0645344 13. 
. 0893147 -0. 


- .3450708 
- .4108369 
- .4663616 
. 7513821 
-.216278 


. 1142922 
.0660522 
.2517165 
1.010261 
.1420073 


The robust standard errors for grant and grant- are actually smaller than the usual ones, 


but each is still not statistically significant at the 5% level against a one-sided alternative. In 


addition, they are not jointly significant, as the p-value is about . 33: 


test grant grant_1 
( 1) grant = 0 
( 2) grant_1 = 0 
F( 2, 53) = 
Prob > F = 


7.10. The Stata results are: 


use gpa 


reg trmgpa spring 


black female 


Source 


Model 


. 156689 
. 140267 


1.14 
0.3266 


Number of obs 
F( 11, 


Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


720) 


cumgpa crsgpa frstsem season sat verbmath hsperc hssize 


732 
70.64 


spring 
cumgpa 
crsgpa 
frstsem 
season 
sat 
verbmath 
hsperc 
hssize 
black 
female 


- .0121568 
. 3146158 
. 9840371 
. 7691192 

- .0462625 
. 0014097 
- .112616 

- .0066014 

- .0000576 

- .2312855 
. 2855528 


df MS 

11 19.8324263 
720 = .280750371 
731 .574961636 
Std. Err t 
. 0464813 -0.26 
. 0404916 7.77 
. 0960343 10.25 
. 1204162 6.39 
.0470985 -0.98 
. 0001464 9.63 
. 1306157 -0.86 
. 0010195 -6.48 
. 0000994 -0.58 
. 0543347 -4.26 
.0509641 5.60 
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Geoooo0o00000 


- . 1034118 
.2351201 
. 7954964 
.5327104 

- .1387292 
.0011223 

- .3690491 

- .0086029 

- .0002527 

- .3379589 
. 1854967 


.0790983 
. 3941115 
1.172578 
1.005528 
.0462042 
.0016972 
.1438171 
- .0045998 
.0001375 
- .1246122 
. 3856089 


_cons | -2.067599 . 3381007 -6.12 0.000 -2.731381 -1.403818 


reg trmgpa spring cumgpa crsgpa frstsem season sat verbmath hsperc hssize 
black female, robust cluster(id) 


Linear regression Number of obs = 732 
F( 14, 365) = 71.31 
Prob > F = 0.0000 
R-squared = 0.5191 
Root MSE = .52986 


(Std. Err. adjusted for 366 clusters in id 


| Robust 

trmgpa | Coef. Std. Err. t P>|t | [95% Conf. Interval 

a ee ee ee eee ae eae +--------------------------------------------------------------- 
spring | -.0121568 .0395519 -0.31 0.759 - .089935 .0656215 
cumgpa | . 3146158 .0514364 6.12 0.000 . 2134669 .4157647 
crsgpa | . 9840371 .09182 10.72 0.000 . 8034745 1.1646 
frstsem | . 7691192 .1437178 5.35 0.000 . 4865003 1.051738 
season | -.0462625 .0431631 -1.07 0.285 - .131142 .038617 
sat | . 0014097 .0001743 8.09 0.000 . 001067 .0017525 
verbmath | - .112616 . 1495196 -0.75 0.452 - .4066441 .1814121 
hsperc | -.0066014 .0011954 -5.52 0.000 - .0089522 - .0042506 
hssize | -.0000576 .0001066 -0.54 0.589 - .0002673 .0001521 
black | -.2312855 .0695278 -3.33 0.001 - .368011 - .0945601 
female | . 2855528 .0511767 5.58 0.000 . 1849146 . 386191 
_cons | -2.067599 . 3327336 -6.21 0.000 -2.721915 -1.413284 


Some of the fully robust standard errors are actually smaller than the corresponding 
nonrobust standard error, although the one on cumgpa is quite a bit larger, and drops the t 
statistic from 10.25 to 6.12. No variable that was statistically significant based on the usual t 
statistic becomes statistically insignificant, although the length of some confidence intervals 
change. The ¢ statistics for the key variable, season, are similarly and show season is not 
Statistically significant. 

7.11. a. The following Stata output should be self-explanatory. There is clearly strong 
positive serial correlation in the errors of the static model (6 =. 792, ts = 28.84) and the fully 
robust standard errors are much larger than the nonrobust ones. Not, for example, that the ¢ 


statistic on the log of the conviction probability, Jprbconv goes from —20.69 to —7. 75. 


use cornwell 
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xtset county year 


panel variable: 
time variable: 


delta: 


81 to 87 


county (strongly balanced) 
year, 
1 unit 


reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87 


Source 


Model 


| 117.644669 
| 88.735673 


| 206.380342 


Number of obs = 630 
F( 114, 618) = 74.49 
Prob > F = 0.0000 
R-squared = 0.5700 
Adj R-squared = 0.5624 
Root MSE = 37893 


lprbarr 
lprbconv 
lprbpris 
lavgsen 
lpolpc 
d82 


| -.7195033 
| -.5456589 
|  .2475521 
| -.0867575 
|  .3659886 
|  .0051371 
|  -.043503 
| -.1087542 
| -.0780454 
| -.0420791 
| -.0270426 
| -2.082293 


- . 7917042 
- .5974413 
. 1155314 
- . 2005023 
. 3070248 
- . 1086284 
- .1566662 
- .222504 
- .1925835 

- .15563 
- .1387815 
-2.576438 


- .6473024 
- .4938765 
. 3795728 
.0269872 
. 4249525 
.1189026 
.0696601 
.0049957 
. 0364928 
.0714719 
. 0846963 
-1.588149 


predict uhat, 


gen uhat_1 


resid 


= l.uhat 


(90 missing values generated) 


reg uhat uhat_1 


Source 


Model 


| 46.6680407 
| 30.1968286 


| 76.8648693 


Number of obs = 
538) 


F( 1, 
Prob > F 
R-squared 


Adj R-squared 


Root MSE 


| .7918085 
| 1.74e-10 


df MS 
11 10.6949699 
618 .143585231 
629 .328108652 
Std. Err t 
. 0367657 -19.57 
. 0263683 -20.69 
. 0672268 3.68 
.0579205 -1.50 
. 0300252 12.19 
.057931 0.09 
.0576243 -0.75 
.057923 -1.88 
. 0583244 -1.34 
.0578218 -0.73 
.056899 -0.48 
.2516253 -8.28 
df MS 
1 46.6680407 
538 .056127934 
539 „142606437 
Std. Err. t 
.02746 28.84 
.0101951 0.00 


0.000 
1.000 


. 7378666 
- .0200271 


.8457504 
.0200271 


reg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc 


Linear regression 


95 


d82-d87, cluster(county 


Number of obs 
89) 


F( 11, 
Prob > F 
R-squared 
Root MSE 


630 
= 37.19 
= 0.0000 
= 0.5700 
= .37893 


lprbarr 
lprbconv 
lprbpris 
lavgsen 
lpolpc 
d82 


Robust 


adjusted for 90 clusters in county 


- . 7195033 
- .5456589 
» 2475521 
- .0867575 
. 3659886 
.0051371 
- .043503 
- .1087542 
- .0780454 
- .0420791 
- .0270426 
- 2.082293 


. 1095979 
.0704368 
. 1088453 
1130321 
.121078 
0367296 
. 033643 
.0391758 
. 0385625 
.0428788 
. 0381447 
.8647054 


- .9372719 
- .6856152 

0312787 
- .38113499 

. 1254092 
- .0678439 
- .1103509 
- . 1865956 
- . 1546683 
-.1272783 
- . 1028353 
-3.800445 


- .5017347 
- .4057025 
. 4638255 
. 1378348 
.6065681 
.0781181 
. 0233448 
- .0309127 
- .0014224 
.0431201 
.0487502 
- . 3641423 


drop uhat uhat_1 


b. We lose the first year, 1981, when we add the lag of log(crmrte): 


gen lcrmrte_1 = 1.lcrmrte 


(90 missing values generated) 


reg lcrmrte lcrmrte_1 lprbarr lprbconv lprbpris lavgsen lpolpc d83- 


Source 


Model 


163.287174 
16.8670945 


180.154268 


Number of obs = 
F( 11, 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 


528) 


lcrmrte_1 
lprbarr 
lprbconv 
lprbpris 
lavgsen 
lpolpc 
d83 


. 8263047 
- . 1668349 
- .1285118 
- .0107492 
- .1152298 

. 101492 
- .0649438 
- .0536882 
- .0085982 

0420159 

.0671272 
- .0304828 


df MS 

11 14.8442885 
528 .031945255 
539 .334237975 
Std. Err. t 
.0190806 43.31 
.0229405 -7.27 
.0165096 -7.78 
.0345003 -0.31 

.030387 -3.79 
.0164261 6.18 
.0267299 -2.43 
.0267623 -2.01 
.0268172 -0.32 

.026896 1.56 
.0271816 2.47 
. 1324195 -0.23 


. 7888214 
- .2119007 
- . 1609444 

- .078524 

- .174924 

.0692234 
- .1174537 
- .1062619 
- .0612797 
- .0108203 

0137298 
- .2906166 


. 8637879 
-.1217691 
- .0960793 

.0570255 
- .0555355 

. 1337606 
- .0124338 
- .0011145 

. 0440833 

. 0948522 

. 1205245 

.229651 


Not surprisingly, the coefficient on the lagged crime rate is very large and statistically 


significant. Further, including it makes all other coefficients much smaller in magnitude. The 


96 


variable log(prbpris) now has a negative sign, although it is insignificant. Adding the lagged 
crime rate does not change the positive coefficient on the size of the police force: it is smaller 
but now even more statistically significant. 

c. There is little evidence of serial correlation in the model with a lagged dependent 


variable. The coefficient on #1 is small and statistically insignificant: 


predict uhat, resid 
(90 missing values generated) 


gen uhat_1 = l.uhat 
(180 missing values generated) 


reg lcrmrte lcrmrte_1 lprbarr lprbconv lprbpris lavgsen lpolpc d84-d87 


uhat_1 

Source | SS df MS Number of obs = 450 

SS Se ee em eine Se Ge a ee iS Seine aie sae F( 141, 488) = 370.77 
Model | 138.488359 11 12.5898508 Prob > F = 0.0000 
Residual | 14.8729012 438 .033956395 R-squared = 0.9030 
-------------+------------------------------ Adj R-squared = 0.9006 
Total | 153.36126 449 .341561826 Root MSE = .18427 
lcrmrte | Coef Std. Err t P>|t | [95% Conf. Interval 

fe) a a re is a a +--------------------------------------------------------------- 
lcrmrte_1 | .829714 .0248121 33.44 0.000 . 7809485 .8784796 
lprbarr | -.1576381 .0278786 -5.65 0.000 - .2124305 - . 1028457 
lprbconv | -.1293032 .0191735 -6.74 0.000 - .1669868 - .0916197 
lprbpris | -.0040031 .0395191 -0.10 0.919 - .0816738 .0736675 
lavgsen | -.1241479 .034481 -3.60 0.000 -.1919166 - .0563791 
lpolpc | .1107055 .0187613 5.90 0.000 .0738323 . 1475788 
d84 | .0103772 .0277393 0.37 0.709 - .0441415 . 0648959 
d85 | .0557956 .0277577 2.01 0.045 . 0012407 . 1103505 
d86 | . 107831 .0277087 3.89 0.000 .0533724 . 1622895 
d87 | . 1333345 .0279635 4.77 0.000 .0783751 . 1882938 
uhat_1 | -.0592978 . 0601101 -0.99 0.324 -.177438 . 0588423 
cons | .0126059 .1524765 0.08 0.934 - .2870706 . 3122823 


d. None of the log(wage) variables is statistically significant, and the magnitudes are 
pretty small in all cases. The p-value for the joint test, made fully robust, is .33, which means 


the log(wage) variables are jointly insignificant, too. (Plus, the different signs on the wage 


variables is hard to explain, except to conclude that each is estimated with substantial sampling 
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error.) 


Linear regression 


lwcon-lwloc, cluster(county) 


lcrmrte_1 
lprbarr 
lprbconv 
lprbpris 
lavgsen 
lpolpc 
d83 

d84 

d85 

d86 

d87 
lwcon 
lwtuc 
lwtrd 
lwfir 
lwser 


Robust 


Std. Err. 


Number of obs 


F( 20, 
Prob > F 
R-squared 
Root MSE 


reg lcrmrte lcrmrte_1 lprbarr lprbconv lprbpris lavgsen lpolpc d83-d87 


540 
398.63 
0.0000 
0.9077 

.17895 


adjusted for 90 clusters in county 


. 8087768 
.1746053 
. 1337714 
.0195318 
. 1108926 
. 1050704 
.0729231 
. 0652494 
.0258059 
. 0263763 
. 0465632 
. 0283133 
. 0034567 
0121236 
. 0296003 

012903 
. 0409046 
. 1070534 
. 0903894 
.0961124 
- 6438061 


0406432 
. 0495539 
.0289031 
.0404094 
. 0455404 
.0575404 
.0293628 
.0226239 
. 0413435 
.0393741 
.0441727 
.0272813 
.0208431 
.0496718 
.0184296 
.0269695 
.0508117 
.0760639 
. 0548237 
.1355681 
. 7958054 


. 7280195 
- .2730678 
- .1912012 
- .0998243 
- .2013804 
- .0092612 
- .1312664 
- .1102026 
-.1079545 
- .0518591 

- .041207 
- .0825207 
- .0448715 
- .0865733 
- .0070189 
- .0406847 
- .1418664 

- .044084 

- .199323 
- .1732585 
-2.225055 


. 889534 
.0761428 
.0763415 
. 0607608 
. 0204049 

. 219402 
.0145799 
.0202962 
. 0563428 
. 1046118 
. 1343334 

.025894 
.0379582 
.1108205 
.0662195 
.0664908 
.0600573 
. 2581908 
.0185442 
. 3654833 
. 9374423 


BQ¥QBOROms 


1) lwcon = 0 
2) lwtuc = 0 
3) Ilwtrd = 0 
4) Il1wfir =0 
5) lwser = 0 
6) lwmfg = 0 
7) lwfed = 0 
8) lwsta = 0 
9) Ilwloc = 0 
F( 9, 89) = 
Prob > F = 


1.15 
0.3338 


7.12. Wealth at the beginning of the year cannot be strictly exogenous in a savings 
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equation: if saving increases unexpectedly this year — so that the disturbance in year t is 


positive — beginning of year wealth is higher next year. This is analogous to Example 7.8, 


where cumulative grade point average at the start of the semester cannot be strictly exogenous 


in an equation to explain current-term GPA. 


7.13. a. The Stata output is below. Married men are estimated to have a scoring average 


about 1.2 points higher, and assists are .42 higher. The coefficient in the rebounds equation is 


—. 24, but it is not statistically significant. The coefficient in the assists equation is significant 


at the 5% level against a two-sided alternative (p-value = .048). 


use nbasal 


sureg (points rebounds assists = 


Seemingly unrelated regression 


age exper expersq coll guard forward black 


points 
rebounds 
assists 


5.352116 
2.375338 
1.64516 


chi2 P 
59.80 0.0000 
128.07 0.0000 
167.51 0.0000 


points 
age 
exper 
expersq 
coll 
guard 
forward 
black 


rebounds 
age 
exper 
expersq 
coll 
guard 
forward 
black 
marr 
_cons 


-1.214936 
2.261943 
- .0649631 
-1.011535 


-1.756984 
1.525138 
- . 1051347 
-1.800649 
. 1584603 
- .4916863 
-.1247815 
- .1293911 
21.55662 


- .6728889 

2.998747 
- .0247915 
- .2224201 


- .2818077 

. 830967 
- .0344878 
- .3689707 
-2.727081 
. 0896382 
1.003824 
- . 2406585 
10.87601 


.1227409 
. 1668415 
.0090964 
. 1786866 
. 4163206 
.4167631 
. 3626705 

. 309188 
3.005231 
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"R-sq" 
0.1750 
0.3123 
0.3727 
Z P>|z| 
39 0.000 
02 0.000 
17 0.002 
51 0.012 
13 0.033 
44 0.151 
81 0.071 
77 0.076 
14 0.000 
30 0.022 
.98 0.000 
79 0.000 
06 0.039 
55 0.000 
22 0.830 
77 0.006 
78 0.436 
.62 0.000 


-.5223754 
. 5039637 
- .0523165 
- . 71919 
-3.543054 
- . 72172024 
. 2930028 
- .8466559 
4.985864 


- .04124 
1.15797 
- .0166591 
- .0187514 
-1.911107 
. 9064789 
1.714645 
. 3653389 
16.76615 


assists | 

age | -.3013925 . 0850104 -3.55 0.000 - .4680097 - .1347752 

exper | . 6633331 .1155545 5.74 0.000 . 4368506 . 8898157 
expersq | -.0222961 . 0063002 -3.54 0.000 - .0346442 - .009948 

coll | -.1894703 .1237584 -1.53 0.126 - . 4320323 .0530916 

guard | 2.478626 . 2883437 8.60 0.000 1.913482 3.043769 
forward | . 4804238 . 2886502 1.66 0.096 - .0853202 1.046168 
black | -.1528242 .2511857 -0.61 0.543 - .645139 . 3394907 

marr | .4236511 . 2141437 1.98 0.048 .0039371 . 843365 

cons | 7.501437 2.081423 3.60 0.000 3.421922 11.58095 


b. The Stata test command gives 
test marr 
( 1) [points]marr = 0 


2) [rebounds ]marr 
3) [assists]marr = 0 


~ea 


chi2( 3) 
Prob > chi2 


The rejection is very strong, presumably coming mainly from the points and assists 
equations. Rather than thinking being married causes a basketball player to be more 
productive, it could be that the more productive players — at least when it comes to points and 
assists — are more likely to be married. 

7.14. Let 6 be the estimator that uses Q and let Š be the estimator that uses A. Because 
SGLS.1 to SGLS.3 hold, 

Avar[ JN (B-B)] = EX; X). 
Further, we know from the general result for FGLS, 
Avar[ VN (B- B)] = [E(X A X] E(X; Auu AXE; 
Now, because E(u;u;|X;) = Q, it follows that 
E(X;A ‘uju;A 'X;) = E(X;A OQA ËX;) 


by a simple iterated expectations argument. So, we have to show that 
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EXA X] EXAT OAT X EXA X] — [E(Xj}Q7X;)] 7 
is positive semi-definite. We use the standard trick of showing that 
E(X/Q7'X;) — EX; AX )[E(X; A QAX] E(X; A X:;) 

is positive semi-definite. To this end, define Z; = Q™?X; and W; = Q7'7A"X,. Then 
straightforward algebra shows that the difference above can be written as 
E(Z}Z;) — E(Z;W;)[E(W;W;)]-!E(WZ,;) which is easily seen to be E(R;R;), where R; is the 
G x K matrix of population residuals from the regression of Z; on W;: R; = Z; — WII where 
II = [E(W{W;)]-'E(W;Z;). Matrices of the form E(R;R;) are always positive semi-definite 
because for a nonrandom vector a, a'E(R;R;)a =E[(aR;)/(aR;)] > 0. 

7.15. Let ê = B, be the FGLS estimator from the full model. Then, because SGLS.1 
through SGLS.3 hold, we know 

Avar[ JN ($ -8)] = [E(WiQ™wW,)]7 


where W; = (Xi, Z;). Using partitioned matrix multiplication, 


E(XjQ'X;) E(X'Q'Z; 
E(W'Q"'W,) = (X; ) EC ) l 
E(ZiQ'X;) E(ZiQ'Z,;) 


Further, because E(X; ® Z;) = 0, it follows that E(X;Q"'Z;) = 0. Therefore, E(W;Q'W;) is 


block diagonal and is equal to 


E(X;Q.'X;) 0 
0 E(Z'Q'Z;) 


Inverting this matrix gives 
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~  _ ( EQ) 9 
Avar[ JN (8 — 8)] -( 0 [E(Z;Q'Z;)]* ) 


and Avar[ /N (ĝ — B)] is the upper left hand block: 
Avar[ VN (B-B)] = [ECXiQ*X,)]7. 
Now let B be the FGLS estimator from y ; = XiB + v;. We know this estimator is consistent 
for B because v; = Zy + uj, and so 
EX; ® v;) = 0 

because E(X; ® Z;) = 0 and E(u;|X;, Z;) = 0. Now, FGLS of y, = X;ĝ + v; using a consistent 
estimator of A = E(v;v;) generally has asymptotic variance 

[E(X;A XJ] TECX)A ‘viviA 1X; [E(X;A ‘X,;)]7 
Let r; = Zy so that we can write 

viv; = (r; + u(r; +u) = rr} + r;u; + uir; + usu}. 
Now E(r;u;|X;) = 0 because E(u;|X;, Z:) = 0 and r; is a function of Z;. Therefore, 

E(vivi| X) = E(rri|X;) + E(uju)|X;) = E(rriX;) +Q. 

Using iterated expectations, 


E(X)A'vivjA'X;) = E[E(X)A ‘vv/A'X,|X)] = E[X/ATE(v,v)|X,))A‘X;] 
= E[X}ATE(ri)|X;)A 'X;] + E(X;A QA 'X;) 
= E(X)A ‘rirtA 'X;) + E(X;A 'QA 'X;) 


We have shown that 


Avar[ /N (B -B)] = Aa! {E(X)Atrir/A'X;) + ECX!A QA !X;)} A5! 
= A E(XiA riri AXA + Ap E(X)A QA ”X;)A3' 
= C + As E(X)A 'OQA'X;)A5! 
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where Ay = E(X;A7'X;) and C2 = Aj'E(X;A riri AX;)A3". Note that C2 is positive 
semi-definite. 


Now, Problem 7.14 established that 


A>'E(X!A 1QA*X;)A5! — A7! 


is positive semi-definite. Therefore, 


Avar[ JN (B — B)] — Avar[ JN (B - B)] = Co + [AF E(XIA QA TX Az = AZ!) 


and each matrix is positive-semi-definite. We have shown the result. 

Interestingly, the proof shows that the asymptotic inefficiency of B has two sources. First, 
we have omitted variables that are uncorrelated with X;. The second piece is due to using the 
wrong variance matrix, A. If we could effectively use Q in obtaining the estimator with Z; 
omitted — which we can in principle if we observe Z; — then the only source of inefficiency 


would be due to omitting Z; (as happens in the single-equation case). 
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Solutions to Chapter 8 Problems 
8.1. Letting O(b) denote the objective function in equation (8.27), it follows from 


multivariable calculus that 
N l N 

6O(b)) jae Vag Ror a 

age 2 ZX; |W 3 Zi(y; — Xib) }. 
Evaluating the derivative at the solution B gives 

N i N 
e zx.) vÈ Ziy,- xi) s 
i=1 
In terms of full data matrices, we can write, after simple algebra, 
(X'ZWZ X)B = (X'ZWZ Y). 

Solving for B gives (8.28). 

8.2. a. We can apply general GMM theory to obtain consistency and /N asymptotic 
normality of the 3SLS estimator (GMM version). The four assumptions given in the problem 
are sufficient for SIV.1 to SIV.3, where W = (N= as ZiQZ;) ` and 
W = [E(Z'QZ)] = plim(W). (This assumes plim Q = Q = E(u;u!), something that holds 
quite generally.) However, without SIV.5, 3SLS is not necessarily an asymptotically efficient 
GMM estimator. 

b. The asymptotic variance of the 3SLS estimator is given in equation (8.29) with the 
choice of W in part a: 


Avar/N (B35, — B) = (C'WC) '(C'WAWC)(C'WC)*, 


where A = E(Z/u;u}Z,), as in the text. (Note this expression collapses to (CWC) when 
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A = W”, as happens under SIV.5.) 
c. A consistent estimator of Avar yN Ê sszs — B) is given in equation (8.31) with 
Â = NIP” ZjûûZ) and û; = y, — X55), the 3SLS residuals: 
N 
(X Z/INŴ(Z'X/N)] exz0094( w > ZAZ NEXX, 
i=1 
The estimator of Avar(ĝ 3sis) is simply this expression divided by N. Even though the 
formula looks complicated, it can be programmed fairly easily in a matrix-based language. Of 
course, if we doubt SIV.5 in the first place, we would probably use the more general minimum 
chi-square estimator, as it is asymptotically more efficient. (If we were going to obtain the 
robust variance matrix estimate for 3SLS anyway, it is no harder to obtain the minimum 
chi-square estimate and its asymptotic variance estimate.) 
8.3. First, we can always write x as its linear projection plus an error: x = x* + e, where 
x* = ZII and E(z'e) = 0. Therefore, E(z'x) = E(z'x*), which verifies the first part of the hint. 


To verify the second step, let h = h(z), and write the linear projection as 
L(y|z,h) = zI1, + hIT2 

where IT; is M x K and II; is Q x K. Then we must show that II2 = 0. But, from the two-step 
projection theorem (see Property LP.7 in Chapter 2), 

TI, = [E(s's)] tE(s'r), where s = h — L(hjz) andr = x — L(x|z). 
Now, by the assumption that E(x|z) = L(x|z),r is also equal to x — E(x|z). Therefore, 
E(r|z) = 0, and so r is uncorrelated with all functions of z. But s is simply a function of z since 
h = h(z). Therefore, E(s'r) = 0, and this shows that II; = 0. 


8.4.a. For the system in (8.12), we show that, for each g, rank E[(z, h)'x¢] = rank E(z’xg) 
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for any function h=h(z). Now, by Problem 8.3, L(x¢|z,h) = L(xg|z) = zI, when E(x,|z) is 
linear in z and h is any function of z. As in Problem 8.3, E(z'xg) = E(z'x?) = E(z’z)IM1. Also, 
if we let e-=x, — x7, then E(h'eg) = 0, and so E[(z,h)'xg] = E[(z,h)'x?] = E[(z, h)'z]Il;. But 
rank E[(z,h)'z] = rank E(z'z), which means that rank E[(z,h)'z]II, = rank E[(z'z)II;. We 
have shown that rank E[(z, h)'xg] = rank E[(z’xg), which means adding h to the instrument list 
does not help satisfy the rank condition. 

b. If E(x,|z) is nonlinear in z, then L(x,|z, h) will generally depend on h. This can 
certainly help in satisfying the rank condition. For example, if Kg < M (the dimension of z) 
then the order condition fails for equating g using instruments z. But we can add nonlinear 
functions of z to the instrument list that are partially correlated with x, and satisfy the order 
and rank condition. We use this fact in Section 9.5. 

8.5. This follows directly from the hint. Straightforward matrix algebra shows that 
(C'A'C)-(C'WC)(C'WAWC)!(C'WC) can be written as 

C'A; —D(W'D) D'AC, 
where D = A!?WC. Since this is a matrix quadratic form in the L x L symmetric, idempotent 
matrix I, — D(D'D)“'D’, it is necessarily itself positive semi-definite. 

8.6. a. First, Qu; = (o Htun + o ?un,o Pun + o72u2)'. Therefore, 


1 
Zi 


= Zi (ot uit T oun) 
Z (o Pun +0”?un) 


The expected value of this vector depends on E(z},u;1), E(Zhun), E(Zhun), and E(Zhun). If 


1 
z Za 0 
ZQ lu; = ( á (otun +oup,0 ui + o?un)' 
0 
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E(zi,ui2) + 0 or E(Zhun) + 0 then E(Z;Q‘u;) + 0 except by fluke. In fact, if E(zhun) = 0, 
E(Zbun) = 0, and o” + 0 then E(ZiQu;) + Oif E(zhun) + 0 or E(zinuin) + 0. 

b. When o12 = 0,0! = 0, in which case E(z),ui1) = 0 and E(z},u;2) = 0 imply 
E(Z'Q'u,) = 0. 

c. If the same instruments are valid in each equation — so E(zjui1) = E(z;ui2) = 0 — then 
E(Z'Q'u,) = 0 without restrictions on Q. 

8.7. When Q is diagonal and z; has the form in (8.15), BA Z'ÔZ; = Z' (ly 8 Q)Zisa 
block diagonal matrix with g” block 62 owe ZigZig ) = 632,2, where Zg denotes the N x Lg 
observation matrix of instruments for the g% equation. Further, Z'X is block diagonal with g” 
block Z,,X¢. Using these facts, it is now straightforward to show that the 3SLS estimator 
consists of [X,Ze(Z,Zg) ZX] 'X,LZe(Z,Ze) '!Z,Y ¢ stacked from g = 1,...,G. This is just 
the system 2SLS estimator or, equivalently, 2SLS equation-by-equation. 

8.8. a. With Zi = (Z4, Zi, ..., Zir)" and X; = (Xh, Xi... Xir)! 

T T T 
Z;Z; = X Zizu, Z;Xi = Do ix and Ziy, = Are 
1 t1 Ei 
Summing over all 7 gives 
N T N T N T 
Z'Z = D ZX = D an and Z'Y = ťa 
i=1 t=1 i=l t=1 i=l t=1 
b. rank EO ZiXi ) =K. 
c. Let û; be the T x 1 vector of pooled 2SLS residuals, û; = y; — Xip. Then we just use 


(8.31) with W = (Z'Z/N)"! and A = N! par Z\a;0;Z;, cancelling N everywhere: 
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N 
[(X'Z)(Z'Z) (ZX) XZ(ZZ) > ziaz ) ZZZ DIX DZIK]. 


d. Using reasoning almost identical to Problem 7.7, (8.65) implies that, for s < t, 


E(ujltisZiZis) = E[E(UiUisZZis|Zi, Uis Zis)] 
= ELE (u ilZi Uis, Zis Uis, Zips Zis] 
F E[O s: UisZip Zis] =0 


because E(u;i|Zir, Uis, Zis) = 0 for s < t. A similar argument works for t > s. So for all t + s, 
E(uitisZiz,Zis) = 0. 
Similarly, (8.66) and iterated expectations implies that 


E(u3z),2ir) = E[E(u3z),2i\Zin)| 


= E[E(u]Z)Z; Zu] = "0° E[@uzy) t= 1,... T. 


Together, these results imply that 

T 

Var(zju;) = 0? $ El (zi,zir)]). 

t=1 
A consistent estimator of this matrix is ¢2(Z'Z/N), where ô? = 1/(NT) ye ee fiz, by the 
usual law-of-large-numbers arguments. A degrees of freedom adjustment replaces NT with 
NT — K. Replacing Da Z;4;4;Z; in (8.67) with 62(Z'Z) [since 62(Z'Z/N) can play the role of 
A under the maintained assumptions] and cancelling gives the estimated asymptotic variance 


A 


of B as 
62[(X'Z)(Z'Z) (ZK). 
This is exactly the variance estimator that would be computed from the pooled 2SLS 


estimation. This means that the usual 2SLS standard errors and test statistics are 
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(8.67) 


asymptotically valid. 
e. If the unconditional variance changes across ¢, the simplest approach is to weight the 
variables in each time period by 1/6;, where ô? is a consistent estimator of o? = Var(ux). A 


consistent estimator of ô? is 


Now, apply pooled 2SLS to the equation 
(Vilt) = (Xir/6)B + errori 


using instruments z;;/G;. The usual statistics from this procedure are asymptotically valid: it 
can be shown that it has the same yN -asymptotic distribution as a if we knew the o?. This 
estimator is a generalized instrumental variables (GIV) estimator except it is consistent under 


the contemporaneous exogeneity assumption only. It turns out to be identical to the GMM 

i ie : 1T0N T ap =e : fas 
estimator that uses weighting matrix (N DD iit ) — the optimal weighting 
matrix under the assumptions in the problem. See Im, Ahn, Schmidt, and Wooldridge (1999, 


Section 2) for discussion of a more general result. 


8.9 The optimal instruments are given in Theorem 8.5, with G = 1: 


z; = [@(z;)]‘E(xi\z;), @(zi) = E(u? |z)). 


II 


2 


If E(u?|z;) = o° and E(x;|z;) = z,II, then the optimal instruments are o~* z;II. The constant 


-2 clearly has no effect on the optimal IV estimator, so the optimal instruments are 


multiple o 
ZilI. These are the optimal IVs underlying 2SLS, except that II is replaced with its 
JN -consistent OLS estimator. The 2SLS estimator has the same asymptotic variance whether 


I or I is used, and so 2SLS is asymptotically efficient. 
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If E(u|x) = 0 and E(v?|x) = o°, the optimal instruments are o~? E(x|x) = o~°x, and this 
leads to the OLS estimator. 


8.10.a. Write ui = pui, -1 + er, and plug into yi; = Xup + uy to get 
Vie = XeP+ Pti, -1 + eirt = 2,...,T. 
Under the assumption 


E(ui|Zit, Uit, t-1, Xi, t-1, Zit-1, Xi, t2, --. , Uil, Xi, Zi) = 0 (8.68) 


the previous assumption satisfies the dynamic completeness assumption when p = 0. If we 
assume that E(u2|zj:,u;, 1) is constant under Ho, then it satisfies the requisite 
homoskedasticity assumption as well. As shown in Problem 8.8, pooled 2SLS estimation of 
this equation using instruments (Zjr, ui, 1) results in valid test statistics. 

Now we apply the results from Section 6.1.3: when p = 0, replacing u;, 1 with the initial 
2SLS residuals ;, »-1 has no effect as N gets large, provided that (8.68) holds. Thus, we can 
estimate 

Vie = Xuß + pti, -1 + Errori t = 2,...,T, 
by pooled 2SLS using instruments (Z; û;, 1), and obtain the usual ¢ statistic for p. 

b. If E(u2|ziz, ui, 1) is not constant, we can use the usual heteroskedasticity-robust ¢ 
statistic from pooled 2SLS for p. This allows for dynamic forms of heteroskedasticity, such as 
ARCH and GARCH, as well as static forms of heteroskedasticity. 

8.11. a. This is a simple application of Theorem 8.5 when G = 1. Without the 7 subscript, 
xı = (z1,y2) and so E(xi|z) = [z,,E(2|z)]. Further, Q(z) =Var(wi|z) =o7. It follows that the 
optimal instruments are (1/o7)[z;, E(y2|z)]. Dropping the division by o? clearly does not affect 


the optimal instruments. 
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b. If y2 is binary then E(y2|z) = P(y2 = 1|z) = F(z), and so the optimal IVs are [z1, F(z)]. 

8.12. a. As long as E(Z'u) = 0 holds the estimator is consistent. After all, it is a GMM 
estimator with a particular weighting matrix that satisfies all of the GMM regularity 
conditions. 

b. Unless the optimal weighting matrix W consistently estimates [Var(Z/u;)]~!, the statistic 
fails to be asymptotically chi-square. 

c. Since Q and A converge to the same constant matrix A = Q, there is no difference in 
asymptotic efficiency (at least using the usual yN -asymptotic distribution). 

8.13. a. The optimal instrumental variable is 
z* = [E(uilz)]" E[zi,y2,ziyalz] = (07) "[z1, EQalz),21EQ2|z)] = (ot) *[z1, 2m2, 21 (zm2)]. 

b. The coefficients m2 can be estimated by running an OLS of y2 on z. Since the inverse of 
the variance is a scalar that does not depend on z, it cancels out in the IV estimation. Thus we 
can operationalize the optimal IV estimator by using [z1, z72,z1(z72)] as the IVs. The 
estimator as the same yN -asymptotic distribution as if we knew n2. 

8.14. a. With y,. = Zull + Vie and E(zj,uin) = 0,¢ = 1,..., T maintained, E(y),uin) = 0 is 


the same as E(yi,uin) = 0. We can always write the linear projection of wi onto Vin as 


Uin = ViP + ein 


E(Vipein) 0,¢=1,...,7 


where we assume that the coefficients p, do not change over time. Thus, we can write the 


extended equation 
Vin = Na + Zin81 + Yp A2 + ViP ten, t= 1,...,T7 


Now the control function procedure is clear. (1) Estimate the reduced form y, = Zil + Vie 
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by pooled OLS (equation-by-equation if necessary when y,,. is a vector) and obtain the 


residuals, V2. (2) Run the pooled OLS regression 
Jin ON 1, d21, aay dT, Zit, Yin» Ñin, t= 1,. aser Ti = T; A .,N. 


and use a fully robust Wald test of Ho : p, = 0. The test has G; degrees of freedom in the 
chi-square distribution, or one can use an F approximation by dividing the chi-square statistic 
by Gi. 

b. Extending the discussion in the text around equation (6.32), partition Ziz = (8, hi2) 
where g,,. is 1 x G1 (the same dimension as y,,,) and hj is 1 x Q1. Obtain the fitted values J, 
from the first-stage regressions. Then, obtain the residuals, f;2 from the pooled OLS regression 

hi2 On Zin, Ñip, t = 1,...,T;i = 1,...,N. 
Let win be the P2SLS residuals. Then run the pooled OLS regression 
fin ON Tig, t = 1,...,T;i = 1,...,N, 
and test the fi2 for joint significance. A fully robust Wald test is most appropriate, and its 
limiting distribution under the null that all elements of Z; are exogenous is XO y 

8.15. a. The coefficient shows that a higher fare reduces passenger demand for flights. The 
estimated elasticity is —. 565, which is fairly large. Even the fully robust 95% confidence 
interval is pretty narrow, from —. 696 to —. 434. Incidentally, the standard error that is robust 


only to heteroskedasticity and not serial correlation is about . 0364, which is actually slightly 


smaller than the usual OLS standard error. So it is important to use the fully robust version. 
use airfare 
. xtset id year 

panel variable: id (strongly balanced) 


time variable: year, 1997 to 2000 
delta: 1 unit 
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reg lpassen y98 y99 yOO lfare ldist ldistsq 


Source 


Model 


F( 


Number of obs = 
4589) = 


6, 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


ldist 


_cons 


Ss df MS 
230.557732 6 38.4262887 
3360.12968 4589 .732213921 
3590.687411 4595 .781433605 

Coef Std. Err. t 

.0321212 0357118 0.90 
. 081651 .035724 2.29 

. 1380369 0358761 3.85 
- .5647711 0369644 -15.28 
-1.54939 3265076 -4.75 
.1227088 0247935 4.95 
13.65144 1.094166 12.48 


- .0378911 
.0116148 
.067 7024 

- .6372392 

-2.189502 
.0741017 
11.50635 


. 1021335 
.1516873 
. 2083713 
- .4923031 
- .9092778 
.171316 
15.79653 


reg lpassen y98 y99 y00 


Linear regression 


lfare ldist ldistsq, cluster(id) 


Number of obs 
F( 6, 1148) 
Prob > F 
R-squared 
Root MSE 


ldist 


_cons 


Std. Err. 


.0321212 

.081651 

. 1380369 

- .5647711 
-1.54939 

.1227088 

13.65144 


.0050262 
.0073679 


.0222597 
.0671949 
. 1174636 
- .6956597 
-2.919242 
.0198916 
9.106074 


.0419827 
.0961072 
.1586101 
- . 4338826 
-.179538 
.2255261 
18.1968 


b. I use the test that allows the explanatory variables to be non-strictly exogenous. The 


estimate of p is essentially one. In a pure time series context we would have to worry how this 


amount of persistence in the errors affects inference. Here, inference is standard because it is 


with fixed T and N > œ. But the “unit root” in sux : t = 1,...,7} is of some concern because 


it calls into question whether there is a meaningful relationship between passenger demand and 


airfares. If the error term rarely returns to its mean (which we can take to be zero), in what 
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sense is do movements in airfare over time cause movements in passenger demand? 


predict uhat, 


resid 


gen uhat_1 = l.uhat 
(1149 missing values generated) 


reg lpassen y99 y00 lfare ldist ldistsq uhat_1, 


Linear regression 


robust 


Number of obs 
F( 6, 3440) 
Prob > F 
R-squared 
Root MSE 


Robust 


.0502195 
. 1105098 
- .628955 
-1.549142 
.1269054 
1.005428 
13.81801 


.0065875 
.0072252 
.0095767 
.0726222 
.0055092 
.0062555 
. 2389316 


0373036 
. 0963437 
- .6477315 
-1.691528 
. 1161037 
. 9931627 
13.34955 


. 0631354 

. 124676 
.6101784 
1.406755 
.1377071 
1.017693 
14.28647 


c. The coefficient on concen; is .360 and the t-statistic that accounts for heteroskedasticity 


and serial correlation is 6.15. Therefore, the partial correlation between /fare and concen is 


enough to implement an IV procedure. 


reg lfare y98 y99 yOO ldist ldistsq concen, cluster(id) 


Linear regression 


ldistsq 
concen 
_cons 


Number of obs 
F( 6, 1148) 
Prob > F 
R-squared 
Root MSE 


4596 
205.63 
0.0000 
0.4062 

. 33651 


(Std. Err. adjusted for 1149 clusters in id 


Robust 


Std. Err. 


0211244 
.0378496 
.09987 


.0041474 
.0051795 
. 0056469 


.0129871 
.0276872 
. 0887906 
-1.435168 
. 0634647 
. 2452315 
4.420364 


.0292617 

.048012 
.1109493 
. 3680328 
. 1425745 
.4750092 
7.998151 
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d. The IV estimates are given below. The estimated elasticity is huge, —1. 78. This seems 


very large. The fully robust standard error is about twice as large as the usual OLS standard 


error, and the fully robust 95% confidence interval is —-2.71 to —. 84, which is very wide, but it 


excludes the point estimate from pooled OLS (-. 5.65). 


ivreg lpassen y98 y99 y0O ldist ldistsq (lfare=concen) 


Instrumental variables (2SLS) regression 


Source 


Model 


-556.334915 
4147 .02233 


Number of obs 


oth dl 
D 
ol 
Ko) 
(e>) 


_cons 


-1.776549 
.0616171 
.1241675 
» 2542695 

-2.498972 
. 2314932 
21.21249 


df MS 

6 -92.7224858 
4589 .903687586 
4595 .781433605 
Std. Err t 
. 2358788 -7.53 
.0400745 1.54 
.0405153 3.06 
.0456607 5.57 
. 4058371 -6.16 
.0345468 6.70 
1.891586 11.21 


F( 6, 4589) 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 
[95% Conf. 
-2.238985 
- .0169481 
.044738 
.1647525 
-3.294607 
.1637648 
17 .50407 


-1.314113 
. 1401824 
. 2035971 
. 3437865 

-1.703336 
. 2992216 
24.9209 


Instrumented: 
Instruments: 


lfare 


y98 y99 yOO ldist ldistsq concen 


ivreg lpassen y98 y99 y00 ldist ldistsq (lfare=concen), cluster(id) 


Instrumental variables (2SLS) regression 


ldistsq 
_cons 


Number of obs = 
1148) = 


F( 6, 

Prob > F 
R-squared 
Root MSE 


= .95062 


(Std. Err. adjusted for 1149 clusters in id 


-1.776549 
.0616171 
.1241675 
» 2542695 

-2.498972 
. 2314932 
21.21249 


Robust 
Std. Err t 
. 4753368 -3.74 
.0131531 4.68 
.0183335 6.77 
.0458027 5.55 

. 831401 -3.01 
.0705247 3.28 
3.860659 5.49 
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O00000 


-2.709175 
. 0358103 
. 0881967 

. 164403 

-4.130207 
.0931215 
13.63775 


- .8439226 
-0874239 
. 1601384 
. 3441359 

- .867 7356 
. 3698649 
28.78722 


Instrumented: lfare 
Instruments: y98 y99 yOO ldist ldistsq concen 


e. To compute the asymptotic standard error of JN (1,P2szs — B1,ros) using the traditional 
Hausman approach, we have to maintain enough assumptions so that POLS is relatively 


efficient under the null. Letting 
wa = (1,y98,,y99,,y00,, [fare i, Idist;, ldist?, concen) 
we would have to assume, under Ho, 


E(wittia) = 0, f= 1,...,T 
E(uj|wir) SGP ey i 


E(uinuin|Wi, Wir) = 0, 7 + t. 


The first assumption must be maintained under the null for the test to make sense. The second 
assumption — homoskedasticity — can never be guaranteed, and so it is always a good idea to 
make tests robust to heteroskedaticity. The current application is a static equation, and so the 
assumption of no serial correlation is especially strong. In fact, from part b we already have 
good evidence that there is substantial serial correlation in the errors (although this test 
maintains contemporaneous exogeneity of /fare;;, along with the distance variables). 

f. The Stata commands are given below. The fully robust ¢ statistic on Pj2 is 2.92, which is 
a strong rejection of the null that ¿farei is (contemporaneousl) exogenous — assuming that 


concen i; is contemporaneously exogenous. 
qui reg lfare y98 y99 y0O ldist ldistsq concen 
predict v2hat, resid 


reg lpassen y98 y99 y0O lfare ldist ldistsq v2hat, cluster(id) 


Linear regression Number of obs = 4596 
F( 7, 1148) = 31.50 
Prob > F = 0.0000 
R-squared = 0.0711 
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.0616171 
.1241675 
» 2542695 
-1.776549 
-2.498972 
. 2314932 
1.249653 
21.21249 


Root MSE 


= .85265 


(Std. Err. adjusted for 1149 clusters in id 


Robust 


Std. Err. 


.0112127 
.0160906 
.040158 
.4197937 
. 767078 
.0640361 
. 4213322 
3.46901 


.0396175 
.0925973 
.1754782 
-2.600198 
-4.004004 
. 1058524 
.4112137 
14.40618 


.0836167 
.1557378 
. 3330608 
- .9528999 
- .9939395 
. 3571341 
2.088093 
28.0188 
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8.16 (Bonus Question). Consider the GIV estimator with incorrect restrictions imposed on 
the estimator of Q. That is, in (8.47) use Ain place of Q with ASAQ. 

a. If Assumption GIV.1 holds, that is E(Z; ® u;) = 0, argue that the GIV estimator is still 
consistent under an appropriate rank condition (and state the rank condition). 

b. Argue that, under the assumptions of part a, the GIV estimator that uses Ais 
JN -asymptotically equivalent to the (infeasible) GIV estimator that uses A. 

c. If you insist on using A but want to guard against inappropriate inference, what would 
you do? 

Solution 

a. From equation (8.47), and applying the law of large numbers, the key orthogonality 


condition for consistency is 

E(Z‘A‘u;) = 0 
because A 4 A. But if Assumption GIV.1 holds, any linear combination of Z; is uncorrelated 
with u;, including A~'Z,;. There are two parts to the rank condition, with the first being the 
most important: 


rank E(Z;A'Z;) = L 
rank E(Z/A'X;) = K 


b. This follows the same line of reasoning that we used for FGLS in Chapter 7 can be used. 


First, using the same trick with the Kronecker product, 


N N 
N ÑO ZA" Z = NY ZIAZ; + 0p(1) 4 ECZA Z) 


i=1 i=1 


N N 
N XO ZÂT X; = NS Z/A7X; + 0,(1) 5 EZA™X,) 


i=1 i=1 
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Second, 


NW x ZÂ u; -= N-? > ZiA tu; = N”? SG @ Zi)'vec(A - A*) 
i=1 i=1 i=l 
= Op(1) + 0p(1) = 0, (1). 
Combining these asymptotic equivalances shows that replacing A with the consistent estimator 
A does not affect the /W-limiting distribution of the GIV estimator. 
c. Use a full robust asymptotic variance matrix estimator. Write 


Aval /N (B-B)] = A “BA” 


where 


where fi; = y; — Xip are the GIV residuals. This asymptotic variance matrix estimator allows 


E(u;u;) + A as well as system heteroskedasticity, that is, E(u;u!|Z;) + E(u;u}). Of course, we 


get Avar(B) as Â “BÂ”, whereby all of the divisions by N disappear. 
8.17 (Bonus Question). Consider a panel data model with contemporaneously exogenous 


instruments Zi: 


Vin = Xub + Uit, 
E(z)uit) = 0,¢= Tattle 


where x; is 1 x Kand z; is 1 x L forall t, L > K. 
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a. If we maintain the assumptions 

ASSUMPTION P2SLS.1: E(zi,uis) = 0, t = 1,...,T 

ASSUMPTION P2SLS.2: (a) rank 7", E(z},ziz) = L; (b) rank X“, E(zi,xu) = K, 

argue that the pooled 2SLS (P2SLS) estimator is generally consistent (as always with 7 
fixed, N > œ, and random sampling across i. 

b. Explain how to estimate the asymptotic variance matrix of the P2SLS estimator under 
the assumptions in part a. 

c. Suppose we add the assumption 

ASSUMPTION P2SLS.3: (a) E(u2z),.zi2) = o°E(zi,zit), t = 1,..., T; (b) E(unuirzi,zis) = 0, 
t#r. 

Argue that the usual 2SLS variance matrix estimator that assumes homoskedasticity and 
ignores the time series component is valid. 

d. What would you do if Assumption P2SLS.3(b) holds but not necessarily P2SLS.3(a)? 

Solution 

a. Using the general formula for the S2SLS estimator, we can write the P2SLS estimator 


(with probability approaching one) as 


Notice how the law of large numbers implies 


N1! SS oai > Feiz) 


i=1 t=1 


No ray > Frein) 


i=1 t=1 


and the rank condition states that these matrices of ranks L and K, respectively. Therefore, the 


plim can pass through all inverses. We also apply the WLLN and Assumption P2SLS.1 to get 


N1! zie? > F Eei) = = 0. 


i=1 t=1 


Now we just pass the plim through using Slutsky’s Theorem to get ĝ 4 B. 


b. We have 


Avar[/N (B-B)] = ABA 


T T ly fT 
= a-(e aizo ) (= Baian | (= Baix ) 
T T “LAT T 
= s- (Se E( wz) (= Bei | 2 De Bont ) 


-1 T 
; @ Bin ) 2 Baix ) 


We can consistently estimate each of these matrices: 


where 
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N T N T -1 N T 
i- (mE yeaa (Zaa) (Eza) 
i=1 t=1 i=1 t=1 i=1 t=1 
N T N T A N T T 
B= (mE Dra (EDan) (HEE Zita) 
i=1 t=1 i=1 t=1 i=1 t=1 r=1 
N T =A N T 
(HE Eua) (Exa) 
i=l 1 i1 t1 
where tie = Vit — XiiP are the P2SLS residuals. 
c. With Assumption P2SLS.3, 
T T T T 
DE DLE Witirtitir) = YE (uaz zi) = 0? $ Ezz), 
t=1 r=1 t=1 t=1 


where the first equality follows from E(wiui-Z,Zis) = 0, t + r, and the second follows from 


E(UÎZ Zu) = o°E(zi,2i1), t = 1,...,T. Therefore, 


T T lyr 
B= e} Baia ) 2 Bei | > Baix ) = 0°A, 


Avar[/N (B-B)] = o°A~ 


and so 


When we use A from part b and a consistent estimator of o? (with optional but standard 


degrees-of-freedom adjustment), 


then we get 
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os N T N T I/S N T oe 
Avar(B) = al (= »; xiz) (= b3 vt (= > nas l , 


which is exactly the standard formula for 2SLS treating the panel data set as one long cross 
section. 


d. We need to make the variance matrix robust to heteroskedasticity only. So 


N T N T -1 N T 
R _ D aS yoz DDA 
B = X Zit N ZiZit UiZiLit 
i=l t1 i=l t1 i=l t1 


The resulting Avar(B) is exactly what would be computed by treating the panel data set as one 
long cross section with inference robust to heteroskedasticity. 
8.18 (Bonus Question). Consider the panel data model 
Vit = XP + uint = 1,...,7, 
where xx is a 1 x K vector and the instruments at time ¢ are Z;, a 1 x L vector for all ¢. Suppose 
the instruments are strictly exogenous in the sense that 
E(uilZi1,Zi2,...,Zir) = E(uilzi) = 0,t = 1,...,T. 
Assume that E(u;u}|z;) = E(u;u}) = Q, where z; is the vector all all exogenous variables in all 


time periods. Further, assume that Q has the AR(1) form: 


P P pr 
p 1: p EA fe ie 
Q= o? p? p A t : = ooP 
1 
ptt pr p 1 


123 


where uir = puii + eint = 1,...,T. 

a. If Z; = (zi,,...,Z)r), find the matrix of transformed instruments, Y~"? Z;. 

b. Describe how to implement the GIV estimator as a particular pooled 2SLS estimation 
when Q has the AR(1) structure. 

c. If you think the AR(1) model might be incorrect, or the system homoskedasticity 
assumption does not hold, propose a simple method for obtaining valid standard errors and test 
statistics. 

Solution 

From Section 7.8.6, we know that when ¥ has the AR(1) structure given above, 

(1 — p2) za 
wg, = Zi — PLii 


ZiT — PZi,T-1 


so that, for £ > 2, the transformation results in quasi-difference. For ¢ = 1, the transformation 
ensures that the transformed errors will have common varianec for all ¢ = 1,...,T. 
b. We need to estimate p, so we would use pooled 2SLS to get residuals, say ù. Then, 


estimate p from the pooled OLS regression 
tit on ii 1-1, t= Qpecey] OE Sati. 
The GIV transformed equation is 


(1 _ pò ya — (1 _ p?) xab $ (1 = p) Pun 
Yit — PYit-1 = Xir — PXi1)B + Ui — ptim, t = 2,...,T. 


The GIV estimator is obtained by replacing p with p and estimating 


Vit = XB + errori, t = Tica ds i= 1,...,N 
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using IVs Zi, where 


(1 = p’) 274 


Zit = Zit — PZit-1, t = Coe h 


N? 
= 
ll 


and where similar definitions hold for J; and X;;. As always, the estimation of p has no effect 
on the yN -asymptotic distribution under the strict exogeneity assumption on the IVs. The 
usual P2SLS statistics from the estimation on the transformed variables are asymptotically 
valid. 

c. If we have misspecified Var(u;|z;) then we should make the P2SLS inference from part b 
fully robust — to heteroskedasticity and serial correlation. In other words, the transformed 
errors 


(1 = p’) LETEN 


Cit = Ui — puii, t = T a 


II 


eil 


will have serial correlation if the AR(1) model is incorrect, and such errors can always have 
heteroskedsticity if <u; does. We know that the GIV estimator that uses an incorrect variance 
structure is still consistent and yN -asymptotically normal. We might get a more efficient 
estimator assuming a simple AR(1) structure than using P2SLS on the original: accounting for 
the serial correlation at all might be better than ignoring it in estimation. This is the same 


motivation underlying the generalized estimation equations literature when the explanatory 


variables are strictly exogenous. 
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Solutions to Chapter 9 Problems 

9.1. a. No. What causal inference could one draw from this? We may be interested in the 
tradeoff between wages and benefits, but then either of these can be taken as the dependent 
variable and estimation of either equation would be by OLS. Of course, if we have omitted 
some important factors, or have a measurement error problem, OLS could be inconsistent for 
estimating the tradeoff. But there is no simultaneity problem: wages and benefits are jointly 
determined, but there is no sense in which an equation for wage and another for benefits satisfy 
the autonomy requirement. 

b. Yes. We can certainly think of an exogenous change in law enforcement expenditures 
causing a reduction in crime, and we are certainly interested in such counterfactuals. If we 
could do the appropriate experiment, where expenditures are assigned randomly across cities, 
then we could estimate the crime equation by OLS. The simultaneous equations model 
recognizes that cities choose law enforcement expenditures in part based on what they expect 
the crime rate to be. An SEM is a convenient way to allow expenditures to depend on 
unobservables (to the econometrician) that affect crime. 

c. No. These are both choice variables of the firm, and the parameters in a two-equation 
system modeling one in terms of the other, and vice versa, have no economic meaning. If we 
want to know how a change in the price of foreign technology affects foreign technology (FT) 
purchases, why would we want to hold fixed R&D spending? Clearly FT purchases and R&D 
spending are simultaneously chosen, but we should use a two-equation SUR setup where 
neither is an explanatory variable in the other’s equation. 

d. Yes. We can be interested in the causal effect of alcohol consumption on productivity, 


and therefore on wage. One’s hourly wage is determined by productivity, and other factors; 
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alcohol consumption is determined by individual choice, where one factor is income. 

e. No. These are choice variables by the same household. It makes no sense to think about 
how exogenous changes in one would affect the other. Further, suppose that we look at the 
effects of changes in local property tax rates. We would not want to hold fixed family saving 
and then measure the effect of changing property taxes on housing expenditures. When the 
property tax changes, a family will generally adjust expenditure in all categories. A SUR 
system with property tax as an explanatory variable is the appropriate strategy. 

f. No. These are both chosen by the firm, presumably to maximize profits. It makes no 
sense to hold advertising expenditures fixed while looking at how other variables affect price 
markup. 

g. Yes. The outcome variables — quantity demanded and advertising expenditures — are 
determined by different economic agents. It makes sense to model quantity demanded as a 
function of advertising expenditures — reflecting that more exposure to the public can affect 
demand — and at the same time recognize that how much a firm spends on advertising can be 
determined by how much of the product it can sell. 

h. Yes. The rate of HIV infection is determined by many factors, with condom usage being 
one. We can easily imagine being interested in the effects of making condoms more available 
on the incidence of HIV. The second equation, which models demand for condoms as a 
function of HIV incidence, captures the idea that more people might use condoms as the risk of 
HIV infection increases. Each equation stands on its own. 


9.2. a. Write the system as 


B ya yı \_[/ Zom +u 


-y2 1 y2 Zð) + U2 
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Unique solutions for yı and y2 exist only if the matrix premultiplying (y1,y2)’ is nonsingular. 
But its determinant is 1 — yi7v2, so a necessary and sufficient condition for the reduced forms 
to exist is yiv2 + 1. 

b. The rank condition holds for the first equation if and only if z(2) contains an element not 
in Zq) and the coefficient in 62) on that variable is not zero. Similarly, the rank condition 
holds for the second equation if and only if Za) contains an element not in Zo) and the 
coefficient in ĝa) on that variable is not zero. 

9.3. a. We can apply part b of Problem 9.2. First, the only variable excluded from the 
support equation is the variable mremarr; since the support equation contains one endogenous 
variable, this equation is identified if and only if 62; + 0. This ensures that there is an 
exogenous variable shifting the mother’s reaction function that does not also shift the father’s 
reaction function. 

The visits equation is identified if and only if at least one of finc and fremarr actually 
appears in the support equation; that is, we need 61; + 0 or 613 + 0. 

b. Each equation can be estimated by 2SLS using instruments 
1, finc, fremarr, dist, mremarr. 


c. First, obtain the reduced form for visits : 
visits = T20 + Moifinc + 122fremarr + Tzdist + M2amremarr + v2. 
Estimate this equation by OLS, and save the residuals, 2. Then, run the OLS regression 
support on 1, visits, finc, fremarr, dist, z 


and do a (heteroskedasticity-robust) ¢ test that the coefficient on 2 is zero. If this test rejects 


we conclude that visits is in fact endogenous in the support equation. 
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d. There is one overidentifying restriction in the visits equation, assuming that 61; and 612 
are both different from zero. Assuming homoskedasticity of u2, the easiest way to test the 
overidentifying restriction is to first estimate the visits equation by 2SLS. as in part b. Let úz 


be the 2SLS residuals. Then, run the auxiliary regression 
uy on 1, finc, fremarr, dist, mremarr; 


the sample size times the usual R-squared from this regression is distributed asymptotically as 


x7 under the null hypothesis that all instruments are exogenous. 


ee ee es 
A heteroskedasticity-robust test is also easy to obtain. Let support denote the fitted values 


from the reduced form regression for support. Next, regress finc (or fremarr) on 


support, mremarr, dist, and save the residuals, say 7;. Then, run the simple regression (without 
intercept) of ûz on 7; and use the heteroskedasticity-robust ¢ statistic on 71. (Note that no 
intercept is needed in this final regression, but including one is harmless.) 

9.4. a. Because the third equation contains no right hand side endogenous variables, a 
reduced form exists for the system if and only if the first two equations can be solved for yı 
and y2 as functions of y3,Z1,Z2,Z3, u1, and u2. But this is equivalent to asking when the system 

1 -y12 yı C1 

1 -yz y2 C2 
has a unique solution in yı and y2. This matrix is nonsingular if and only if yi2 + y22. This 
implies that the 3 x 3 matrix T in the general SEM notation is nonsingular. 

b. The third equation satisfies the rank condition because it includes no right-hand-side 
endogenous variables. The first equation fails the order condition because there are no 


excluded exogenous variables in it, but there is one included endogenous variable. This means 
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it fails the rank condition also. The second equation is just identified according to the order 
condition because it contains two endogenous variables and also excludes two exogenous 
variables. To examine the rank condition, write the second equation as yy, + 262 + u2 = 0, 
where y, = (—1, 722,723)’ and 82 = (621,0,0)'. Write B, = (—1,y22,723,621,622,623)' as the 
vector of parameters for the second equation with only the normalization y2; = —1 imposed. 
Then, the restrictions 622 = 0 and 623 = 0 can be written as RoB, = 0, where 


00001 0 
000001 


R: = 


Now letting B be the 6 x 3 matrix of all parameters, and imposing all exclusion restrictions in 


the system, 


The rank condition requires this matrix have rank equal to two. Provided the vector (632,633)! 
is not a multiple of (612,613)', or 612633 + 613632, the rank condition is satisfied. 


9.5. a. Let B, denote the 7 x 1 vector of parameters in the first equation with only the 


normalization restriction imposed: 
Bi = G1, Y12,Y 13, 011,012,013, O14). 


The restrictions 612 = 0 and 613 + 614 = 1 are obtained by choosing 


0000100 
1000011 


Because R; has two rows, and G — 1 = 2, the order condition is satisfied. Now we need to 


check the rank condition. Letting B denote the 7 x 3 matrix of all structural parameters with 
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only the three normalizations, straightforward matrix multiplication gives 


Op 622 032 


613+ 014-1 023+02-Y21 033 +634 - 731 


RiB = 


By definition of the constraints on the first equation, the first column of R:B is zero. Next, we 
use the constraints in the remainder of the system to get the expression for RıB with all 
information imposed. But y23 = 0,622 = 0,623 = 0,624 = 0,y31 = 0, and y32 = 0, and so RiB 
becomes 


0 O 032 
0 -y2 633 +034- 731 


RiB = 


Identification requires y21 + 0 and 632 + 0. 
b. It is easy to see how to estimate the first equation under the given assumptions. Set 


614 = 1-643 and plug into the equation. After simple algebra we get 
Yı — Z4 = 1292 + 7133 + 61121 + 613(Z3 — Z4) + U1. 
This equation can be estimated by 2SLS using instruments (z1,2Z2,23,z4). Note that, if we just 
count instruments, there are just enough instruments to estimate this equation. 
9.6. a. If y13 = 0 then the two equations constitute a linear SEM. In that case, the first 
equation is identified if and only if 623 + 0 and the second equation is identified if and only if 
O12 # 0. 


b. If we plug the second equation into the first we obtain 


(1 — yi2¥21 — 713¥2121)V1 = (610712020) + (Y12021 + ¥13620 + 611)21 


+ 41362127 + 61222 + 71262323 + ¥ 136232123 + U1 + (V12 + 71321) U2. 


This can be solved for yı provided (1 — y12y21 — Y 1372121) # 0. Given the solution for yı, we 
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can use the second equation to get y2. Note that both are nonlinear in zı unless y13 = 0. 


c. Since E(u4|z) = E(u2|z) = 0, we can use part (b) to get 


E(vi|Z1,22,23) = [(610 + Y12820) + (Y12021 + 713620 + 611)Z1 


+ 71362121 + 61222 + 71262323 + 7136232123 /(1 — Y12721 — 7132121). 

Again, this is a nonlinear function of the exogenous variables appearing in the system unless 
yi3 = 0. If y21 = 0, E(v1|Z1,22,23) becomes linear in z2 and quadratic in zı and z3. 

d. If yi3 = 0, we saw in part a that the first equation is identified. If we include y13y2z1 in 
the model, we need at least one instrument for it. But regardless of the value of y 13, terms z? 
and z1z3 — as well as many other nonlinear functions of z — are partially correlated with y2z1. In 
other words, the linear projection of y2z; onto 1,21,22,23,Z7 and z1Z3 will — except by fluke — 
depend on at least one of the last two terms. In any case, we can test this using OLS with yz; 
as the dependent variarbale and a heteroskedasticity-robust test of two exclusion restrictions. 
Identification of the second equation is no problem, as z3 is always available as an IV for y2. 
To enhance efficiency when y13 + 0, we could add z{ and 2123 (say) to the instrument list. 


e. We could use IVs (1, 21,22,23,23,212Z3) in estimating the equation 


yı = O10 + Yi2V2 + ¥13¥221 + 61121 + 012272 + U1 


by 2SLS, which implies a single overidentifying restriction. We can add other IVs — z5, z3, 


Z1Z2, and z2z3 seem natural — or even reciprocals, such as 1/z, (or 1/(1 + |z;|) if zı can equal 
zero). 

f. We can use the instruments in part e for both equations. With a large sample size we 
might expand the list of IVs as discussed in part e. 

g. Technically, the parameters in the first equation can be consistently estimated if y13 + 0 


because E(y2|z) is a nonlinear function of z, and so zł, z1z2, and other nonlinear functions 
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would generally be partially correlated with y2 and y2z1. But, if yi3 = 0 also, E(v2|z) is linear 


in zı and z2, and additional nonlinear functions are not partially correlated with y2; thus, there 


is no instrument for y2. Since the equation is not identified when y13 = 0 (and 623 = 0), 


Ho : y13 = 0 cannot be tested. 


9.7. a. Because alcohol and educ are endogenous in the first equation, we need at least two 


elements in (Z(2),Z(3)) that are not also in Za). Ideally, we have a least one such element in Zo 


and at least one such element in 2,3). 


b. Let z denote all nonredundant exogenous variables in the system. Then use these as 


instruments in a 2SLS analysis. 


c. The matrix of instruments for each 7 is 


Zi 0 0 
Zı = 0 (z;,educ;) 0 
0 0 Zi 


d. Zœ) = z. That is, we should not make any exclusion restrictions in the reduced form for 


educ. 


9.8. a. I interact nearc4 with experience and its quadratic, and the race indicator. The Stata 


output follows. 


use card 
gen educsq = educ^2 
gen nearc4exper = nearc4*exper 
gen nearc4expersq = nearc4*expersq 
gen nearc4black = nearc4*black 
reg educsq exper expersq black south smsa reg661-reg668 smsa66 nearc4 
nearc4exper nearc4expersq nearc4black, robust 
Linear regression Number of obs = 3010 
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F( 18, 

Prob > F 
R-squared 
Root MSE 


2991) = 


educsq | Coef 

je) pe Mi pl a + 
exper | -18.01791 
expersq | . 3700966 
black | -21.04009 
south | -.5738389 
smsa | 10.38892 
reg661 | -6.175308 
reg662 | -6.092379 
reg663 | -6.193772 
reg664 | -3.413348 
reg665 | -12.31649 
reg666 | -13.27102 
reg667 | -10.83381 
reg668 | 8.427749 
smsa66 | -.4621454 
nearc4 | -12.25914 
nearc4exper | 4.192304 
nearc4expe~q | -.1623635 
nearc4black | -4.789202 
_cons | 307.212 


Robust 


Std. Err. 


1.229128 
.058167 
. 569591 
.973465 
. 036816 
.574484 
.254714 
.010618 
. 069994 
. 439968 
. 693005 
.814901 
.627727 
. 058084 
012394 
1.55785 
0753242 
4.247869 
6.617862 


NOONAN OBR BOW WW 


-20.42793 
. 2560452 
-28.03919 


-22.23542 
-4.567616 
-6.458307 
-26.00874 
1.137738 
- . 310056 
-13.11824 
294.2359 


-15.60789 
. 4841479 
-14.04098 
7.217162 
16 . 34338 
4.754903 
2.250083 
1.670077 
6.527681 
.650031 


test 
( 1) nearc4exper = 0 
( 2) nearc4expersq = 0 
( 3) nearc4black = 0 
F( 3, 2991) = 
Prob > F = 


3.72 
0.0110 


ivreg lwage exper expersq black south smsa reg661-reg668 smsa66 
(educ educsq = nearc4 nearc4exper nearc4expersq nearc4black) 


Instrumental variables (2SLS) regression 


Number of obs = 


Source | SS df MS 

Model | 116.731381 16 7.29571132 

Residual | 475.910264 2993 .159007773 

Total | 592.641645 3009 .196956346 

lwage | Coef Std. Err. t 

Gis ee ae rl et d,s + 

educ | . 3161298 .1457578 2.17 
educsq | -.0066592 .0058401 -1.14 
exper | . 0840117 .0361077 2.33 
expersq | -.0007825 .0014221 -0.55 
black | -.1360751 .0455727 -2.99 


F( 16, 2993) 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 
[95% Conf. 
. 0303342 
- .0181103 
0132132 
- .0035709 
- .2254322 


.6019254 
.0047918 
1548101 
. 0020058 
- .0467181 


south 
smsa 
reg661 
reg662 
reg663 
reg664 
reg665 
reg666 
reg667 
reg668 
smsa66 
_cons 


- .141488 
.1072011 
- . 1098848 
.0036271 
.0428246 
- .0639842 
. 0480365 
.0672512 
.0347783 
- . 1933844 
. 0089666 
2.610889 


.0279775 
.0290324 
.0428194 
. 0325364 
.0315082 
.0391843 
. 0445934 
. 0498043 
.0471451 
.0512395 
.0222745 
.9706341 


Geoooo0o0g0 0000 


- . 1963451 

.0502755 
- . 1938432 
- .0601688 
- .0189554 
- .1408151 
- .0394003 
- .0304028 
- .0576617 
- . 2938526 
- .0347083 

. 7077116 


- .0866308 
. 1641267 
- .0259264 
.0674231 
. 1046045 
0128468 
. 1354734 
. 1649052 
.1272183 
- .0929161 
.0526414 
4.514067 


Instrumented: 


Instruments: 


educ educsq 


exper expersq black south smsa reg661 reg662 reg663 reg664 
reg665 reg666 reg667 reg668 smsa66 nearc4 nearc4exper 
nearc4expersq nearc4black 


The heteroskedasticity-robust Wald test, reported in the form of an F statistic, shows that 


the three interaction terms are partially correlated with educ?: the p-value = .011. (Whether the 


partial correlation is strong enough is a reasonable concern.) 


The 2SLS estimate of £2 is —.0067 with £ = —1.14. Without stronger evidence, we can 


safely leave educ? out of the wage equation 


b. If E(u2|z) = 0, as we would typically assume, than any function of z is uncorrelated with 


u1, including interactions of the form black + z; for any exogenous variable z;. Such 


interactions are likely to be correlated with black + educ if z; is correlated with educ. 


c. The 2SLS estimates, first using black + nearc4 as the IV for black » educ and then using 


black + educ as the IV, are given by the Stata output. The heteroskedasticity-robust standard 


ee 
errors are computed. The standard error using black + educ as the IV is much smaller than the 


standard error using black + nearc4 as the IV. (The point estimate is also substantially higher.) 


ivreg lwage exper expersq black south smsa reg661-reg668 smsa66 
(educ blackeduc = 


Instrumental variables (2SLS) regression 
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nearc4 nearc4black), 


robust 


Number of obs = 


F( 16, 2993) 
Prob > F 
R-squared 
Root MSE 


educ 
blackeduc 
exper 
expersq 
black 
south 
smsa 
reg661 
reg662 
reg663 
reg664 
reg665 
reg666 


Robust 


Std. Err. 


.1273557 
. 0109036 
. 1059116 
- .0022406 
- .282765 
- . 1424762 
.1111555 
- .1103479 
- .0081783 
. 0382413 
- .0600379 
. 0337805 
.0498975 
.0216942 
- . 1908353 
.0180009 
3.84499 


. 0561622 
0399278 
0249463 
. 0004902 
5012131 
0298942 
.0310592 
.0418554 
. 0339196 
. 0335008 
. 0398032 
.0519109 
.0559569 
.0528376 
.0506182 
.0205709 
. 9545666 


.0172352 
.0673851 
. 0569979 
. 0032017 
1.265522 
. 2010914 

. 050256 
1924161 
.0746863 
.0274456 
. 1380824 
. 0680042 
.0598204 
.0819075 
. 2900853 
.0223337 
1.973317 


.2374762 
.0891923 
. 1548253 
- .0012794 
. 6999922 
- .083861 
1720551 
- .0282797 
. 0583298 
. 1039283 
.0180066 
. 1355652 
.1596155 
.1252959 
- .0915853 
. 0583356 
5.716663 


educ blackeduc 
exper expersq black south smsa reg661 reg662 reg663 reg664 


Instrumented: 
Instruments: 


reg665 reg666 reg667 reg668 smsa66 nearc4 nearc4black 


ivreg lwage exper expersq black south smsa reg661-reg668 smsa66 
(educ blackeduc = 


nearc4 blackeduchat), 


Instrumental variables (2SLS) regression 


robust 


Number of obs = 


F( 


16, 2993) 


Prob > F 
R-squared 
Root MSE 


educ 
blackeduc 
exper 
expersq 
black 
south 
smsa 
reg661 
reg662 
reg663 
reg664 
reg665 
reg666 


Robust 


Std. Err. 


. 1178141 
. 035984 
. 1004843 
- .0020235 
- .5955669 
-.1374265 
. 1096541 
- .1161759 
- .0107817 
. 0331736 
- .064916 
023022 
0379568 
. 0100466 
- .1907066 
.0167814 
4.00836 


. 0554036 
.0105707 
.0241951 
. 0003597 
.1587782 
.0294259 
.0306748 
.0409317 
. 0335743 
.0326007 
. 0388398 
.0505787 
. 0534653 
.0513629 
.0502527 
.0203639 
.9416251 


.0091811 
.0152573 
. 0530436 
.0027288 
. 9068923 
.1951236 
.0495083 
- . 196433 
.0766127 
.0307484 
.1410715 
.0761506 
-0668757 
.0906637 
. 2892399 
.0231472 
2.162062 


. 226447 
.0567106 
147925 
- .0013183 
- .2842415 
- .0797294 
. 1697998 
- .0359189 
.0550494 
.0970955 
0112395 
. 1221946 
.1427892 
.1107568 
- .0921733 
.0567101 
5.854658 


Instrumented: educ blackeduc 
Instruments: exper expersq black south smsa reg661 reg662 reg663 reg664 
reg665 reg666 reg667 reg668 smsa66 nearc4 blackeduchat 


d. Suppose E(educ|z) = zt and Var(u|z) = oł. Then by Theorem 8.5, the optimal IVs for 


educ and black + educ are 


or E(educ|z) = o7 zm? 
o7 E(black + educ|z) = oï black » E(educ|z) = o7?black » (z72). 


We can drop the constant oj”, and so the optimal IVs can be taken to be 
[Z1,Z%2, black + (zn2)|. 


When we operationalize this procedure, we can use 
ee a ——~. —"™~ 
[Zi1, Ziz, black; + (z;%2)| = [Zi1, educ;, black; + educi] 


as the optimal IVs. Nothing is lost asymptotically by including black; + educi in the reduced 


form for educ; along with z;. So using 2SLS with IVs [z;, black; + educ] produces the 
asymptotically efficient IV estimator. 

9.9. a. The Stata output for 3SLS estimation of (9.28) and (9.29), along with the output for 
2SLS on each equation, is given below. For coefficients that are statistically significant, the 
3SLS and 2SLS are reasonably close. For coefficients estimated imprecisely, there are some 
differences between 2SLS and 3SLS, but these are not unexpected. Generally, the 3SLS 
standard errors are smaller (but recall that none of the standard errors are robust to 


heteroskedasticity). 


reg3 (hours lwage educ age kidslt6 kidsge6 nwifeinc) 
(lwage hours educ exper expersq) 


Three-stage least-squares regression 


hours 428 6 1368. 362 -2.1145 34.54 0.0000 
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. 6892584 


0.0895 


79.87 


0.0000 


age 
kidsl1t6 
kidsge6 
nwifeinc 
_cons 


exper 


_cons 


1676.933 
-205.0267 
-12.28121 
-200.5673 
-48 . 63986 


431.169 
51.84729 
8.261529 


831.8577 
-306.6455 
-28.47351 
-463.7287 
-119.1032 
-6.396957 

1454.47 


2522.009 
-103.4078 
3.911094 


.000201 
.1129699 
.0208906 

- .0002943 
- . 7051103 


.0002109 
.0151452 
.0142782 
.0002614 
. 3045904 


- .0002123 

. 0832858 
- .0070942 
- .0008066 
-1.302097 


.0006143 
. 1426539 
.0488753 
.000218 
- . 1081241 


Endogenous variables: 
Exogenous variables: 


hours lwage 
educ age kidslt6 kidsge6 


ivreg hours educ age kidslt6 kidsge6 nwifeinc 


Instrumental variables (2SLS) regression 


Source 


Model 


-456272250 
713583270 


257311020 


(lwage 


= exper expersq) 


Number of obs = 


= 0.0027 


age 
kidslt6 
kidsge6 
nwifeinc 
_cons 


1544.819 
-177.449 
-10.78409 
-210.8339 
-47.55708 
-9.249121 
2432.198 


df MS 
6 -76045375 
421 1694972.14 
427 602601.92 
Std. Err t P> 
480.7387 3.21 0. 
58.1426 -3.05 0. 
9.577347 -1.13 0. 
176.934 -1.19 0. 
56.91786 -0.84 0. 
6.481116 -1.43 0. 
594.1719 4.09 0. 


F( 6, 421) 

Prob > F 

R-squared 

Adj R-squared 

Root MSE 
[95% Conf. 
599.8713 
-291.7349 
-29.60946 
-558.6179 
-159.4357 
-21.9885 
1264.285 


2489.766 
-63.16302 
8.041289 
136.9501 
64.3215 
3.490256 
3600.111 


Instrumented: 
Instruments: 


lwage 


educ age kidslt6 kidsge6 nwifeinc exper expersq 


ivreg lwage educ exper expersq (hours 


Instrumental variables (2SLS) regression 


Source 


Model 


4 6.20841113 


24 .8336445 
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Number of obs 
F( 4, 423) 
Prob > F 


age kidslt6 kidsge6 nwifeinc) 


= 428 
= 18.80 
= 0.0000 


Residual 


198 . 493796 


223.327441 


423 


427 


. 469252474 


. 523015084 


exper 


_cons 


. 0001608 
.1111175 
.032646 

- .0006765 
- .69279 


0004426 - 
. 3066002 - 


.0002154 0 
.0153319 7 
.018061 1. 
1 
2 


. 0005842 
.1412536 
.0681465 
.0001935 
- .0901403 


Instrumented: 
Instruments: 


hours 
educ exper expersq age kidslt6 kidsge6 


R-squared 
Adj R-squared 
Root MSE 
P>|t | [95% Conf. 
0.456 - .0002626 
0.000 . 0809814 
0.071 - .0028545 
0.127 - .0015466 
0.024 -1.29544 
nwifeinc 


More efficient GMM estimators can be obtained, along with robust standard errors. The 


weighting matrix is the optimal one that allows for system heteroskedasticity. The (valid) 


standard errors from the GMM estimation are quite a bit larger than the usual 3SLS standard 


errors. The coefficient estimates change some, but the magnitudes are similar and all 


qualitative conclusions hold. 


gmm (hours 


(lwage 


{b1}*lwage - {b2}*educ - {b3}*age - {b4}*kidslt6 
{b5}*kidsge6 - {b6}*nwifeinc - 
{b8}*hour s - {b9}*educ - {b10}*age - {b11}*exper 
{b12}*expersq - {b13}), 


{b7}) 


instruments(educ age kidslt6 kidsge6 nwifeinc exper 


Step 1 
Iteration 
Iteration 
Iteration 


NR © 


Step 2 
Iteration 0: 
Iteration 1: 
Iteration 2: 


GMM estimation 


Number of parameters = 
Number of moments = 


expersq) winitial(identity) 


Initial weight matrix: Identity 


GMM weight matrix: 


Number of obs 


= 428 


GMM criterion Q(b) = 1.339e+11 
GMM criterion Q(b) = 350.92722 
GMM criterion Q(b) = 350.92722 
GMM criterion Q(b) = .08810078 
GMM criterion Q(b) = .00317682 
GMM criterion Q(b) = .00317682 
13 
16 
Robust 
Robust 
Coef Std. Err. Z 
1606.63 618.7605 2.60 
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2819.379 


-179.4037 
-10.29206 
-238.2421 
-46 . 23438 
-10.39225 
2385.333 
.0002704 
. 1153863 
.0031491 
. 0336343 
- .0008599 
- .9910496 


69.66358 
10.88031 
212.7398 
59.7293 
5.576519 
630.2282 
.0003012 
.0158272 
.00654 
.0246234 
.0006453 
.5850041 


Geooo0o000 0000 


-315.9418 
-31.61708 
-655. 2044 
-163.3017 
-21.32203 
1150.108 

- .00032 
. 0843655 
- .0096691 
- .0146266 
- .0021247 
-2.137637 


-42.86564 
11.03295 
178.7202 
70.83289 
.5375285 
3620.558 
. 0008608 

. 146407 
.0159673 
.0818952 

.000405 
. 1555373 


educ age kidslt6 kidsge6 nwifeinc exper expersq 


Instruments for 


Instruments for 


equation 1: 


cons 


equation 2: educ age kidslt6 kidsge6 nwifeinc exper expersq 


_COons 


b. Using y as coefficients on endogenous variables, the three-equation system can be 


expressed as 


hours = ¥ 12lwage + ¥13educ + 611 + O12age + 613kidslt6 + O14kidsge6 + O15nwifeinc + u1 


2 
lwage = Yn hours + ¥23educ + 621 + O12age + d22exper + O23exXper + U2 


educ = 631 + 032age + d33kidslt6 + 634kidsge6 + O3snwifeinc + O36exper + Ss7exper” 


+ d3gmotheduc + d39fatheduc + 63,10huseduc + u3 


The IVs for the first equation are all appearing in the third equation, plus educ. For the second 


and third equations, the valid IVs are all variables appearing in the third equation (which is 


also a reduced form for educ). 


The following Stata output produces the GMM estimates using the optimal weighting 


matrix, along with the valid standard errors. 


gmm (hours - {b1}*lwage - {b2}*educ - {b3}*age - {b4}*kidsl1t6 - 
{b5}*kidsge6 - {b6}*nwifeinc - {b7}) 
(lwage - {b8}*hours - {b9}*educ - {b1i0}*age - {b11}*exper 

- {b12}*expersq - {b13}) 

(educ - {b14}*age - {b15}*kidslt6 - {b16}*kidsge6 - {b17}*nwifeinc 


instruments(age kidslt6 kidsge6 nwifeinc exper expersq 
motheduc fatheduc huseduc) 
instruments(1: educ) winitial(identity) 


Step 1 


Iteration 0: 
Iteration 1: 


GMM criterion Q(b) 
GMM criterion Q(b) 
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1.344e+11 
116344.22 


- {b18}*exper - {b19}*expersq - {b20}*motheduc - {b21}*fatheduc 
- {b22}*huseduc - {b23}), 


Iteration 2: GMM criterion 
Step 2 

Iteration 0: GMM criterion 
Iteration 1: GMM criterion 
Iteration 2: GMM criterion 
GMM estimation 

Number of parameters = 23 


Number of moments 
Initial weight matrix: Identity 


GMM weight matrix: 


= 31 


Robus 


O 

~ 

og 

w 
| 


t 


116344.22 


. 22854932 
.0195166 
.0195166 


= 428 


1195.924 
-140.7006 
-9.160326 
-298.5054 
-72.19858 
-6.128558 

2297.594 

.0002455 

. 1011445 

.0007376 

.031704 

- .000695 
- .6841501 
- .0061529 

.5229541 
- .1128699 

.0273278 

.0284576 
- .0001177 

.1226775 


Std. Err. 


383.1797 
46.54044 
8.687135 
169.207 
43.44575 

3.9233 
511.7623 
.0002852 
. 0233025 
.0061579 
.0194377 
.0003921 
.6119956 
.0135359 
. 209776 
.070775 
. 0093602 
. 0333103 
.0010755 
.0302991 


1294.559 
- .0003134 
.0554724 
- .0113317 
- .0063933 
- .0014635 
-1.883639 
- .0326827 
.1118006 
- .2515864 
.0089821 
- .0368295 
- .0022256 
. 0632923 


1946.942 
-49 . 48298 
7.866146 
33.13422 
12.95353 
1.560969 
3300.63 
. 0008044 
. 1468167 
.0128069 
.0698013 
.0000735 
.5153392 
.0203769 
. 9341075 
.0258465 
.0456735 
.0937447 
.0019902 
. 1820627 


educ age kidslt6 kidsge6 nwifeinc exper expersq 
motheduc fatheduc huseduc _cons 
age kidslt6 kidsge6 nwifeinc exper expersq 
motheduc fatheduc huseduc _cons 
age kidslt6 kidsge6 nwifeinc exper expersq 
motheduc fatheduc huseduc _cons 


Instruments for 
Instruments for 


Instruments for 


equation 1: 
equation 2: 


equation 3: 


9.10. a. No. 2SLS estimation of the first equation uses all nonredundant elements of z; and 


Z2 — Call these z — in the first stage regression for y2. Therefore, the exclusion restrictions in the 


second equation are not imposed. 
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b. No, except in the extremely rare case where the covariance between the structural errors 
is estimated to be zero. (If we impose a zero covariance, then the 2SLS estimates and 3SLS 
estimates will be the same.) Effectively, each equation — including the second — is 
overidentified. 

c. This just follows from the first two parts. Because 2SLS puts no restrictions on the 
reduced form for y2, whereas 3SLS assumes only z2 appears in the reduced form for y2, 2SLS 
will be more robust for estimating the parameters in the first equation. 

9.11. a. Because z2 and z3 are both omitted from the first equation, we just need 622 + 0 or 
623 + 0. The second equation is identified if and only if 611 + 0. 

b. After substitution and straightforward algebra, it can be seen that 711 = 61:/(1 — y12721). 

c. We can estimate the system by 3SLS; for the second equation , this is identical to 2SLS 
since it is just identified. Or, we could just use 2SLS on each equation. Given ôn, Yi2, and 721, 
we would form #11 = 61:/(1 — 7129721). 

d. Whether we estimate the parameters by 2SLS or 3SLS, we will generally inconsistently 
estimate 61; and y12. (We are estimating the second equation by 2SLS so we will still 
consistently estimate y21 provided we have not misspecified this equation.) So our estimate of 
111 = OE(y2|z)/0z, will be inconsistent in any case. 

e. We can just estimate the reduced form E(v2|z1, 22,23) by ordinary least squares. 

f. Consistency of OLS for m1;does not hinge on the validity of the exclusion restrictions in 
the structural model, whereas using an SEM does. Of course, if the SEM is correctly specified, 
we obtain a more efficient estimator of the reduced form parameters by imposing the 
restrictions in estimating 111. 


9.12. a. Generally, E(y3|z) = Var(y2|z) + [E(v2|z)]?; when yi3 = 0 and u1 and u2 are 
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homoskedastic, Var(y2|z) is constant, say 75. (This is easily seen from the reduced form for y2, 
which is linear when y;3 = 0.) Therefore, E(y3|z) = 75 + (T20 + ZzT2)°. 
b. We do not really need to use part a; in fact, it turns out to be a red herring for this 


problem. Since yi3 = 0, 


E(i|z) = 010 + ¥12E(v2|z) + 2101 + E(u4|z) 
= 610+ Y12EQ2|Z) + 2181. 


c. When y13 = 0, any nonlinear function of z, including (#29 + zm), has zero coefficient in 
E(vi|z) = 610 + ¥12(720 + ZT2) + 2161. Plus, if y13 = 0, then the parameters 729 and m2 are 
consistently estimated from the first stage regression y; on 1,z1,i = 1,...,N. Therefore, the 
regression y7 on 1, (20 + Zitz), (20 + ZiT2)*, Zn, i = 1,...,N consistently estimates 610, 
¥12, 0, and 81, respectively. But this is just the regression y; on 1, fz, Ọn)?, Zn, i = 1,...,N. 

d. Because E(u1ı|z) = 0 and Var(u1|z) = of, we can immediately apply Theorem 8.5 to 
conclude that the optimal IVS for estimating the first equation are [1, E(v2|z), E(v3|z),z, |/of, 
and we can drop the division by of. But, if y13 = 0, then E(y2|z) is linear in z and, from part a, 
E(y3|z) = T + [E(y2|z)]*. So the optimal IVs are a linear combination of 
£1, E(y2|z), [E(v2|z)]*,z1}, which means they are a linear combination of <1, z, [E(v2|z)]*}. We 
never do worse asymptotically by using more IVs, so we can use <1, z, [E(y2|z)]*} as an 
optimal set. Why would we use this larger set instead of <1, E(y2|z), [E(v2|z)]”,z1}? For one, 
the larger set will generally yield overidentifying restrictions. In addition, if yi3 + 0, we will 
generally be better off using more instruments: z rather than only L(y2|1, z). 

e. The estimates below are similar to those reported in Section 9.5.2, where we just added 
educ*,age*, and nwifeinc? to the IV list and using 2SLS with /wage = log(wage) and 


lwagesq = [log(wage)]* as endogenous explanatory variables. In particular, the coefficient on 
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lwagesgq is still statistically insignificant. The standard errors reported here are robust to 


heteroskedasticity (unlike in the text). 


gen lwagesq = lwage42 


qui reg lwage educ age kidslt6 kidsge6 nwifeinc exper expersq 


predict lwagehat 
(option xb assumed; fitted values) 


gen lwagehatsq = lwagehat/2 


ivreg hours (lwage lwagesq 
educ age kidslt6 kidsge6 nwifeinc, 


Instrumental variables (2SLS) regression 


= exper expersq lwagehatsq) 


robust 


Number of obs 


F( 7, 420) = 


Prob > F 
R-squared 
Root MSE 


lwage 
lwagesq 
educ 

age 
kidslt6 
kidsge6 
nwifeinc 
_cons 


1846.902 
-373.16 
-103.2347 
-9.425115 
-187.0236 
-55.70163 
-7.5979 
1775.847 


856.1346 
401.9586 
71.99564 
8.848798 
177.1703 
45.86915 
4.491138 
881.2631 


164.0599 
-1163.261 
-244.7513 
-26.81856 
-535.2747 
-145.8633 
-16.42581 

43.61095 


3529.745 
416.9412 
38.28199 
7 .968333 
161.2274 
34.46007 
1.230009 
3508.082 


Instrumented: 
Instruments: 


lwage lwagesq 
educ age kidslt6 kidsge6 nwifeinc exper expersq lwagehatsq 


gen educsq 


educ^2 


gen agesq = age^2 


gen nwifeincsq = nwifeinc^2 


ivreg hours (lwage lwagesq 


Instrumental variables (2SLS) regression 


= exper expersq educsq agesq nwifeincsq) 
educ age kidslt6 kidsge6 nwifeinc, 


robust 


Number of obs = 
F( 7, 420) 
Prob > F 
R-squared 
Root MSE 


Robust 


792.3005 
313.7658 
50.01226 
8.512639 
179.9036 

44.6767 
4.037903 
671.9361 


P>|t | [95% Conf. 
0.018 316.2521 
0.164 -1054.038 
0.080 -186.1566 
0.283 -25.87499 
0.304 -538.6789 
0.193 -146 . 0073 
0.074 -15.17044 
0.014 337.1489 


3430.989 
179.4559 
10.45442 
7.590381 
168.5682 
29.6283 
. 7035957 
2978.702 


hours | Coef 
lwage | 1873.62 
lwagesq | -437.2911 
educ | -87.8511 
age | -9.142303 
kidslt6 | -185.0554 
kidsge6 | -58.18949 
nwifeinc | -7.233422 
_cons | 1657.926 
Instrumented: lwage lwagesq 
Instruments: 


nwifeincsg 


educ age kidslt6 kidsge6 nwifeinc exper expersq educsq agesq 


9.13. a. The first equation is identified if, and only if, 622 + 0 (the rank condition). 


b. Here is the Stata output: 


use openness 


robust 


Number of obs = 


F( 2, 
Prob > F 
R-squared 
Root MSE 


111) 


reg open lpcinc lland, 
Linear regression 
open | Coef 
lpcinc | 5464812 
lland | -7.567103 
cons | 117.0845 


1.436115 
1.141798 
18.24808 


P>|t | [95% Conf. 
0.704 -2.299276 
0.000 -9.829652 
0.000 80.92473 


3.392238 
-5.304554 
153.2443 


With ¢ = —6. 63, we can conclude that is shows that log(/and) is very statistically 


significant in the reduced form for open. The negative coefficient implies that smaller 


countries are more “open.” 


c. Here is the Stata output. First 2SLS, the OLS, both with heteroskedasticity-robust 


standard errors. 


ivreg inf (open = lland) lpcinc, 


Instrumental variables (2SLS) regression 


robust 
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Number of obs 
F( 2, 141) 
Prob > F 


= 114 
= 2.53 


R-squared 
Root MSE 


inf | Coef 
open | -.3374871 
lpcinc | . 3758247 
cons | 26.89934 


Std. Err. 


. 1524489 
1.378542 
10.9199 


- .6395748 
-2.355848 
5.260821 


- .0353994 
3.107497 
48.53785 


Instrumented: open 


Instruments: lpcinc lland 


reg inf open lpcinc, robust 


Linear regression 


Number of obs = 


F( 2, 111) 
Prob > F 
R-squared 
Root MSE 


inf | Coef 
open | -.2150695 
lpcinc | .0175683 
_cons | 25.10403 


Std. Err. 


.0794571 
1.278747 
9.99078 


- .3725191 
-2.516354 
5.306636 


- .0576199 
2.55149 
44.90143 


The IV estimate is larger in magnitude — by more than 50% — but its standard error is 


almost twice as large as the OLS standard error. There is some but not overwhelming evidence 


that open is actually endogenous. The variable-addition Hausman test, made robust to 


heteroskedasticity, has ¢ = 1.48. With N = 114, we might not expect very strong evidence. 


d. If we add y130pen? to the equation, we need an IV for it. Since log(/and) is partially 


correlated with open, [log(/and)]? is a natural candidate. A regression of open? on 


log(/and), [log(/and)]*, and log(pcinc) gives a heteroskedasticity-robust ¢ statistic on 


[log(/and)]* of about 2. This is borderline, but we will go ahead. The Stata output for 2SLS is 


ivreg inf (open opensq = lland llandsq) lpcinc, 


Instrumental variables (2SLS) regression 
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robust 


Number of obs = 


F( 3, 110) 
Prob > F 


R-squared 
Root MSE 


-2.613604 
- .0030796 
-2.447896 
6.761785 


. 2163303 
.0182358 
3.461114 
79.5807 


Robust 
inf | Coef Std. Err. t 
open | -1.198637 . 7139934 -1.68 
opensq | .0075781 .0053779 1.41 
lpcinc | . 5066092 1.490845 0.34 
_cons | 43.17124 18.37223 2.35 
Instrumented: open opensq 
Instruments: lpcinc lland llandsq 


The squared term indicates that the impact of open on inf diminishes; the estimate would be 


significant at about the 8.1% level against a one-sided alternative. 


e. Here is the Stata output for implementing the method described in the problem: 


qui reg open lpcinc lland 


predict openhat 
(option xb assumed; fitted values) 


Number of obs = 


F( 3, 
Prob > F 
R-squared 
Root MSE 


110) 


gen openhatsq = openhat42 
reg inf openhat openhatsq lpcinc, robust 
Linear regression 
Robust 
inf | Coef Std. Err t 
openhat | -.8648092 - 5762007 -1.50 
openhatsq | 0060502 .0050906 1.19 
lpcinc | 0412172 1.293368 0.03 
cons | 39.17831 15.99614 2.45 


-2.006704 
- .0040383 
-2.521935 
7.477717 


.2770854 
0161387 
2.604369 

70.8789 


Qualitatively, the results are similar to the appropriate IV method from part d, but the 


coefficient on openhat is quite a bit smaller in magnitude using the “forbidden regression.” If 


¥13 = 0, E(open|lpcinc, lland) is linear, and Var(open|lpcinc, lland) is constant then, as shown 
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in Problem 9.12, both methods are consistent. But the forbidden regression implemented in this 
part is unnecessary, less robust, and we cannot trust the standard errors, anyway. 

Incidentally, using openhatsq as an IV, rather than a regressor, gives very similar estimates 
to using //andsq as an IV for opensg. 

9.14. a. Agree. In equation (9.13), the reduced form variance matrix, A = E(v'v) is always 
identified. Now the structural variance matrix can be written as È = T'AT, so if I and A are 
both identified, so is x. 

b. Disagree. In many cases a linear version of the model is not identified because there are 
not enough instruments. In that case, identification of the more general model hinges on the 
model actually having nonlinearities, a tenuous situation. 

c. Disagree. E(u|z) = 0 implies that z} is uncorrelated with ug for all h = 1,..., L and 
g = 1,...,G. This kind of orthogonality, along with the rank condition, is sufficient for the 
GMM, traditional, or GIV versions of 3SLS to be consistent. We need not restrict Var(u|z) for 
consistency, and robust inference is easily obtained. 

d. Disagree. Even true SEMs can have other problems that cause endogeneity, name, 
omitted variables and measurement error. In these cases, some variables may be valid 
instruments in one equation but not other equations. 

e. Disagree. Control function approaches generally require more assumptions in order to be 
consistent. Take equations (9.70) and (9.71) as an example. The CF approach proposed there 
basically requires assumption (9.72), which can be very restrictive, especially if y2 or y3 
exhibit discreteness. By contrast, we can use an IV approach that specifies z and nonlinear 
functions of z as instruments in directly estimating (9.70) (by 2SLS or GMM). When they are 


consistent, we might expect CF approaches to be more efficient asymptotically. 
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9.15. a. The model with ai3 = 0, y,, = 0, and y}, = Ois 


yı = Z101 + @11y2 + @12V3 + U3 
and we maintain that this equation is identified. Necessary is that L > Lı + 2, as is assumed in 
the problem. The rank condition is more complicated, and we assume it holds. In other words, 
we assume we have enough relevant instruments for y2 and y3. If (9.71) also holds, and say 
E(u2|z) = E(u3|z) = 0, then 
E(v2ya|z) = (zB,)(ZB3) + E(u2uslz) 


E(v2z|z) = (2B,)z1 
E(v3z1|z) = (zB,)z1 


which means that squares and cross products in z are natural instruments for the interaction 
terms. Note that there are plenty of these interaction terms, and they are likely to provide 
enough variation because the linear version of the model is identified. For example, if zz has 
zero coefficient in the reduced form for y3, then zzz; appears in E(y2z,|z) but not in E(v3z;|z). 
Further, suppose E(w2u3\|z) is actually constant. then squares of elements in z2, where 

Z = (Z1,Z2), appear in E(y2y3|z) but not the other expectations. If E(w2u3\z) is not constant, it 
would cancel out with those squares only by fluke. Further, even if we only take (9.71) to be 
linear projections, that would only helps our cause because if these are not conditional 
expectations then even more nonlinear functions of z would be useful as instruments. 

b. There are no overidentification restrictions. We have the same number of instruments as 
explanatory variables. That is one drawback to using fitted values as IVs: there are no 
ovidentification restrictions to test. 

c. There are L — Lı — 2 overidentification restrictions. The Lı subvector zı acts as its own 


instruments, and we need two more elements of z to instrument for y2 and y3. The rest of the 
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explanatory variables are taking care of by the interaction terms. 

d. We can get many overidentification restrictions by using as IVs the nonredundant 
elements of (z;,z; Q z;) — that is, the levels, squares, and cross products of z;. For example, if 
L = 5, Lı = 2, and z and z, include a constant, then there are 5 + 4+ 6 = 15 instruments and 
2+2+2+2 = 8 explanatory variables. 

e. To solve this problem, we also need to assume E(w2u3|z) is constant, and to explicitly 
state that z; includes a constant. By Theorem 8.5, the optimal IVs are then given in part a, 
along with z;, because Var(u1|z) = of. If E(u2u3|z) is constant then 
(zB, )(zB,) + E(u2u3|z) = (zB,)(zB,) + constant. We operationalize the IVs by replacing ß, 
and B, with the estimators from the first-stage regressions. 

The estimators from parts c and d would also be asymptotically efficient because linear 
combinations of the instruments in those two parts are the optimal instruments. 
Asymptotically, it does not hurt to use redundant instruments; asymptotically, the optimal 


linear combination will be picked out via the first-stage regression. 
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Solutions to Chapter 10 Problems 

10.1. a. Because investment is likely to be affected by macroeconomic factors, it is 
important to allow for these by including separate time intercepts; this is done by using T — 1 
time period dummies. 

b. Putting the unobserved effect c; in the equation is a simple way to account for 
time-constant features of a county that affect investment and might also be correlated with the 
tax variable. Something like “average” county economic climate, which affects investment, 
could easily be correlated with tax rates because tax rates are, at least to a certain extent, 
selected by state and local officials. If only a cross section were available, we would have to 
find an instrument for the tax variable that is uncorrelated with c; and correlated with the tax 
rate. This is often a difficult task. 

c. Standard investment theories suggest that, ceteris paribus, larger marginal tax rates 
decrease investment. 

d. I would start with a fixed effects analysis to allow arbitrary correlation between all 
time-varying explanatory variables and c;. (Actually, doing pooled OLS is a useful initial 
exercise; these results can be compared with those from an FE analysis). Such an analysis 
assumes strict exogeneity of Z;, taxi and disaster; in the sense that these are uncorrelated with 
the errors wi; for all ¢ and s. 

I have no strong intuition for the likely serial correlation properties of the {u}. These 
might have little serial correlation because we have allowed for c;, in which case I would use 
standard fixed effects. However, it seems more likely that the u; are positively autocorrelated, 
in which case I might use first differencing instead. In either case, I would compute the fully 


robust standard errors along with the usual ones. In either case we can test for serial correlation 
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in {wir}. 

e. If taxy and disaster, do not have lagged effects on investment, then the only possible 
violation of the strict exogeneity assumption is if future values of these variables are correlated 
with wi. It seems reasonable not to worry whether future natural disasters are determined by 
past investment. On the other hand, state officials might look at the levels of past investment in 
determining future tax policy, especially if there is a target level of tax revenue the officials are 
trying to achieve. This could be similar to setting property tax rates: sometimes property tax 
rates are set depending on recent housing values because a larger base means a smaller rate can 
achieve the same amount of revenue. Given that we allow tax; to be correlated with c;, 
feedback might not be much of a problem. But it cannot be ruled out ahead of time. 

10.2. a. 02, 62, and y can be consistently estimated (assuming all elements of zi; are 
time-varying). The first period intercept, 01, and the coefficient on female, 61, cannot be 
estimated. 

b. Everything else equal, 02 measures the growth in wage for men over the period. This is 
because, if we set female; = 0 and Z; = Zn, the change in log wage is, on average, 02 (set 
d2, = 0 and d22 = 1). We can think of this as being the growth in wage rates (for males) due 
to aggregate factors in the economy. The parameter 62 measures the difference in wage growth 
rates between women and men, all else equal. If 62 = 0 then, for men and women with the 
same characteristics, average wage growth is the same. 


c. Write 


log(wagei) = 01 + Zay + Oi female; + ci + ui 


log(wagei2) = 01 + 02 + Z2y + female; + female + ci + ui, 


where I have used the fact that d2; = 0 and d2 = 1. Subtracting the first equation from the 
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second gives 
Alog(wage;) = 02 + Aziy + 62female; + Aui. 
This equation shows explicitly that the growth in wages depends on Az; and gender. If 
Zi = Zp then Az; = 0, and the growth in wage for men is 02 and that for women is 02 + 62, 
just as above. This shows that we can allow for c; and still test for a gender differential in the 
growth of wages. But we cannot say anything about the wage differential between men and 
women for a given year. 
10.3. a. Let X; = (xa +X2)/2, Y; = Wa t+y2)/2, Ša = Xa — X;, X22 = X2 — X;, and 

similarly for ýa and ¥j2. For T = 2 the fixed effects estimator can be written as 

N a 

Bre = | Lisi + Xp) l [Zisa + 2yo) | 

i=1 i=1 

Now, by simple algebra, 


Xi. = (Xa — Xj2)/2 = —Ax;/2 
Xj = (Kj —X)/2 = Ax;/2 
Ya = Va —V2)/2 = —Ay;/2 
V2 = V2-ya)/2 = Ayi/2 


Therefore, 


XX TF XX = Ax'Ax;/4 T Ax‘ Ax ;/4 = Ax‘ Ax;/2 
xpi T. XP = Ax;Ay;/4 T AX;Ay;/4 = Ax;Ay;/2 


and so 
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i=1 


N ELAN 
f= > sx) (= ssn 
i=1 


b. Let ûn = ïa — Xa pp and iz = Hi — X28 pp be the fixed effects residuals for the two 
time periods for cross section observation 7. Since B FE = B rp» and using the representations 
above, we have 


tha = —Ay;/2 — (-AX//2)B py = (Avi — AXiB pp)/2 = —ê;/2 
ûn = Ayil2 — (AX/2)B pp = (Avi — AXiB pp)/2 = 6,2, 


where ê; = Ay; — AxiB rp are the first difference residuals, i = 1,2,...,N. Therefore, 


N N 
Saa + a) = (1/2) D5 e. 
i=1 i=1 


This shows that the sum of squared residuals from the fixed effects regression is exactly one 
half the sum of squared residuals from the first difference regression. Since we know the 
variance estimate for fixed effects is the SSR divided by N — K (when T = 2), and the variance 
estimate for first difference is the SSR divided by N — K, the error variance from fixed effects 
is always half the size as the error variance for first difference estimation, that is, 62 = 62/2 
(contrary to what the problem asks you to show). What I wanted you to show is that the 
variance matrix estimates of B rp and B rp are identical. This is easy since the variance matrix 


estimate for fixed effects is 


N A N eu N a 
ai) Pahta thio | - (rasan - (Dax 
i=1 i=1 i=1 
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which is the variance matrix estimator for FD estimator. Thus, the standard errors, and the fact 
all other test statistics (F statistics) will be numerically identical using the two approaches. 

10.4. a. Including the aggregate time effect, d2;, can be very important. Without it, we 
must assume that any change in average y over the two time periods is due to the program, and 
not to external factors. For example, if y; is the unemployment rate for city i at time ¢, and 
progų denotes a job creation program, we want to be sure that we account for the fact that the 
general economy may have worsened or improved over the period. If d2, is omitted, and 
02 < 0 (an improving economy, since unemployment has fallen), we might attribute a decrease 
in unemployment to the job creation program, when in fact it had nothing to do with it. For 
general 7, each time period should have its own intercept (otherwise the analysis is not entirely 
convincing). 

b. The presence of c; allows program participation to be correlated with unobserved 
individual heterogeneity, something crucial in contexts where the experimental group is not 
randomly assigned. Two examples are when individuals “self-select” into the program and 
when program administrators target specific groups that may benefit more or less from the 
program. 

c. If we first difference the equation, use the fact that proga = 0 for all i, d2, = 0, and 


d2 = 1, we get 
yotya = 92+ diprogi2 + un — Ui, 
or 
Ay; = 02 + d1progi2 + Aui. 


Now, the FE (and FD) estimates of 02 and 6; are just the OLS estimators from this equation 
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(on cross section data). From basic two-variable regression with a dummy independent 
variable, > is the average value of Ay over the group with progi2 = 0 — that is, the control 
group. Also, 65 and 6; is the average value of Ay over the group with proga = 1 — that is, the 


treatment group. Thus, as asserted, we have 


ô» z AY control ô 1 = DY erect = DY sonirol 
If we did not include the d2;, 61 = AY rea the average change of the treated group. The 
demonstrates the claim in part b that without the aggregate time effect any change in the 
average value of y for the treated group is attributed to the program. Differencing and 
averaging over the treated group allows program participation to depend on time-constant 
unobservables affecting the level of y, but that does not account for external factors that affect 
y for everyone. 


d. In general, for T time periods we have 


Yu = 01 + 02d2; + 03d3,+...+0rdT; + O1progit + Ci + Uin, 


that is, we have separate year intercepts, an unobserved effect c;, and the program indicator. 

e. First, the model from part d is more flexible because it allows any sequence of program 
participation. Equation (10.89), when extended to 7 > 2, applies only when treatment is 
ongoing. In addition, (10.89) is restrictive in terms of aggregate time effects: it assumes that 
any aggregate time effects correspond to the start of the program only. It is better to use the 
unobserved effects model from part d, and estimate it using either FE or FD. 

10.5. a. Write viv; = c?jpjy + wu; + j-(ciu}) + (ciu})j7. Under RE.1, E(uixi,c;) = 0, 
which implies that E(c;uj|x;) = 0 by iterated expectations. Under RE.3a, E(u;u}|x;,c;) = o2I7, 


which implies that E(u;u;|x;) = 0217 (again, by iterated expectations). Therefore, 
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E(vivi|xi) = E(e7|x)j rjr + Ewu) = A(x) jj, + ofl, 
where /(x;) = Var(c;|x;) = E(c?|x;) (by RE.1b). This shows that the conditional variance 
matrix of v; given x; has the same covariance for all t + s, h(x;), and the same variance for all 
t, h(x;) + 02. Therefore, while the variances and covariances depend on x;, they do not depend 
on time. 

b. The RE estimator is still consistent and yN —asymptotically normal without Assumption 
RE.3b, but the usual random effects variance estimator of Avar(B,,.) is no longer valid because 
E(v;v;|x;) does not have the form (10.30) (because it depends on x;). The robust variance 
matrix estimator given in (7.52) should be used in obtaining standard errors and Wald 
statistics. 

10.6. a. By stacking the formulas for the FD and FE estimators, and using standard 


asymptotic arguments, we have, under FE. 1 and the rank conditions, 


N 
JN (6-0) = G7 G DS s) + 0p(1), 
i=1 
where G is the 2K x 2K block diagonal matrix with blocks A; and Ag, respectively, and s; is 


the 2K x 1 vector 


AX'Au; 
Xü; 


an 
Il 


b. Let Aû; denote the (7 — 1) x 1 vector of FD residuals, and let ii; denote the T x 1 vector 
of FE residuals. Plugging these into the formula for s; givens §;. Let Ô = N-! Sa §,8;, and 


define G by replacing A; and A» with their obvious consistent estimators. Then 


Avar[ /N (6—6)] =G DG” is a consistent estimator of Avar[/N (6 — 0)]. 
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c. Let R be the K x 2K partitioned matrix R = [I, | —Ix]. Then the null hypothesis 


imposed by the Hausman test is Ho : RO = 0. We can form a Wald-type statistic, 


H = (RÔ) [RG DER} (RÂ). 


Under FE.1 and the rank conditions for FD and FE, H has a limiting x4 distribution. The 


Statistic requires no particular second moment assumptions of the kind in FE.3. Note that 


R6 = Bip — Bre: 


10.7. a. The random effects estimates are given below. The coefficient on season is —.044, 


which means that being in season is estimated to reduce an athlete’s term GPA by .044 points. 


The nonrobust ¢ statistic is only —1. 12. 


use gpa 


xtset id term 
panel variable: 
time variable: 


delta: 


id (strongly balanced) 
term, 8808 to 8901, but with gaps 
1 unit 


xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black 


Random-effects 


Group variable: 


R-sq: 


within 
between 
overall 


Random effects 
corr(u_i, X) 


GLS regression 


id 


0.2067 
0.5390 
0.4785 


_i ~Gaussian 


= 0 (assumed) 


Number of obs 
Number of groups 


Obs per group: 


min 
avg 
max 


Wald chi2(10) 


732 
366 
2. 
512.77 
0.0000 


| 
| 
| 
| 
| 
verbmath | 
| 
| 
| 
| 
| 


spring 
crsgpa 
frstsem 
season 
sat 


hsperc 
hssize 
black 
female 
_cons 


- .0606536 
1.082365 
. 0029948 

- .0440992 
.0017052 

- .15752 


.0371605 
. 0930877 
.0599542 


Prob > chi2 

P>|z| 

0.103 - .1334868 
0.000 . 8999166 
0.960 - .1145132 
0.261 - .1210044 
0.000 , 0013582 
0.335 - .4779937 
0.000 - .0108977 
0.534 - .000322 
0.001 - .3684048 
0.000 . 2380173 
0.000 -2.43396 


0121797 
1.264814 
1205028 
032806 
0020523 
. 1629538 
- .0060268 
. 000167 
- .1012331 
. 4782886 
-1.035879 


sigma_u | 
sigma_e | 
rho | 


. 37185442 
. 40882825 
.4527451 


(fraction of variance 


due to u_i) 


b. Below are the fixed effects estimates with nonrobust standard errors. The time-constant 


variables have been dropped. The coefficient on season is now larger in magnitude, —.057, and 


it is more statistically significant with ¢ = —1. 37. 


xtreg trmgpa spring crsgpa frstsem season, fe 


Fixed-effects (within) regression 


Group variable: id 


R-sq: within 
between 
overall 


corr(u_i, Xb) 


spring 
crsgpa 


season 


0.2069 
0.0333 
0.0613 


- .0657817 
1.140688 
0128523 

- .0566454 


Number of obs = 732 
Number of groups 7 366 
Obs per group: min = 
avg = 2. 
max = 
F(4, 362) = 23.61 
Prob > F = 0.0000 
Std. Err t P>|t | [95% Conf. Interval 
. 0391404 -1.68 0.094 - .1427528 .0111895 
. 1186538 9.61 0.000 -9073505 1.374025 
. 0688364 0.19 0.852 - .1225172 . 1482218 
.0414748 -1.37 0.173 - .1382072 .0249165 
. 3305004 -2.33 0.020 -1.420747 - .1208636 


| 

| 

frstsem | 
| 

_cons | 
sigma_u 


| 
sigma_e | 
rho | 


- . 7708055 


.67913296 
. 40882825 
. 73400603 


F test that all u_i=0: 


F(365, 362) = 5.40 


Prob > F = 0.0000 


c. The following Stata output gives the nonrobust and fully robust regression-based 


Hausman test. Whether we test the three time averages, crsgpabar, frstsembar, and seasonbar, 


or just seasonbar, the p-value is large (.068 for the joint nonrobust test, . 337 for the single 


nonrobust test). And the findings do not depend on using a robust test: the p-values are a little 


smaller but not close to being significant. 


For comparision, the traditional way of computing the Hausman statistic — directly forming 


the quadratic form in the FE and RE estimates is included at the end, computed two different 


ways. The first uses the difference in the estimated variance matrices, and the value of the 
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Statistic, 1.81, is very close to the nonrobust, regression-based statistic, 1.83. But the degrees 


of freedom reported by Stata are incorrect: it should be three, not four. Thus, the p-value 


reported by Stata using the hausman command is too large. This will be the case whenever 


aggregate time variables — most commonly, time period dummies — are included among the 


coefficients to test. 


If we impose that the RE estimate of the variance o2 is used to estimate both the FE and 


RE asymptotic variances, Stata then recognizes that the variance matrix has rank three rather 


than rank four. (The same is true if we use the FE estimate of o? in both places.) The p-value 


in this case agrees very closely with that for the nonrobust, regression-based test (and both 


Statistics are 1.83 rounded to two decimal places.) 


egen crsgpabar 


egen frstsembar = 


egen seasonbar 


xtreg trmgpa 
female 


Random-effects 


Group variable: 


R-sq: 


within 


between 
overall 


Random effects 


corr(u_i, X) 


= mean(crsgpa), by(id) 


mean(frstsem), by(id) 


= mean(season), by(id) 


spring crsgpa frstsem season sat verbmath hsperc hssize black 


crsgpabar frstsembar seasonbar, 


GLS regression 


id 


0.2069 
0.5408 
0.4802 


_i ~Gaussian 


= 0 (assumed) 


re 


Number of obs 
Number of groups 


Obs per group: 


min 
avg 
max 


Wald chi2(13) 


Prob > chi2 


732 
366 
2. 
513.77 
0.0000 


spring 
crsgpa 
frstsem 
season 
sat 
verbmath 
hsperc 
hssize 
black 
female 


- .0657817 
1.140688 
0128523 

- .0566454 
.0016681 

- .1316461 

- .0084655 

- .0000783 

- .2447934 
. 3357016 


.0391404 
. 1186538 
. 0688364 
.0414748 
. 0001804 
. 1654748 
.0012554 
.000125 
.0686106 
.0711808 
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OO0O0O000000O0 


. 1424954 
. 9081308 
. 1220646 
. 1379345 
. 0013145 
. 4559708 
.0109259 
. 0003232 
.3792676 
. 1961898 


.0109321 
1.373245 
.1477692 
.0246438 
.0020218 
.1926785 
- .006005 
.0001666 
- .1103192 
. 4752134 


crsgpabar 
frstsembar 
seasonbar 
_cons 
sigma_u 
sigma_e 
rho 


- .078244 
. 1243006 


2011254 
. 1461014 
. 1293555 
.5183296 


- .5803537 
- .3645975 
- .1292315 
-2.439668 


. 2080434 
. 2081095 
. 3778326 
- .4078539 


. 37185442 
. 40882825 
.4527451 


crsgpabar frstsembar 


1) crsgpabar = 0 
2) frstsembar = 0 
3) seasonbar = 0 


chi2( 
Prob > chi2 


3) = 


seasonbar 


1) seasonbar = 0 


chi2( 1) = 
Prob > chi2 = 


xtreg trmgpa 
female 


Random-effects 


Group variable: 


R- 


sq: within 


between 
overall 


Random effects 
corr(u_i, X) 


seasonbar 


.83 
. 6084 


92 
. 3366 


spring crsgpa frstsem season sat verbmath hsperc hssize black 
re cluster(id) 


crsgpabar frstsembar seasonbar , 


GLS regression 


id 


= 0.2069 
= 0.5408 
= 0.4802 


u_i ~Gaussian 


= 0 (assumed) 


Number of obs 
Number of groups 


Obs per group: 


avg 


max = 


Wald chi2(13) 


Prob > chi2 


min = 


732 
366 
2. 
629.75 
0.0000 


adjusted for 366 clusters in id 


spring 
crsgpa 
frstsem 
season 
sat 
verbmath 
hsperc 
hssize 
black 
female 
crsgpabar 
frstsembar 
seasonbar 
_cons 


- .0657817 
1.140688 
0128523 

- .0566454 
.0016681 

- . 1316461 

- .0084655 

- .0000783 

- .2447934 
. 3357016 

- . 1861551 
- .078244 
.1243006 

-1.423761 


(Std. Err. 
Robust 
Std. Err Z 
. 0394865 -1.67 
. 1317893 8.66 
. 0684334 0.19 
.0411639 -1.38 
.0001848 9.03 
.166478 -0.79 
.0013131 -6.45 
.0001172 -0.67 
.075569 -3.24 
.067753 4.95 
. 1956503 -0.95 
. 1465886 -0.53 
. 1342238 0.93 
. 4571037 -3.11 


OO0OO0O0O0O0O000000O0O0O 


- .1431737 
. 8823856 
-.1212746 
- .1373251 
.0013059 
- .4579371 
- .0110391 
- .000308 
- .392906 
2029081 
- .5696227 
- .3655525 
- .1387732 
-2.319668 


.0116104 
1.39899 
. 1469793 
. 0240344 
.0020304 
. 1946448 
- .0058918 
.0001514 
- .0966808 
. 4684951 
.1973125 
. 2090644 
. 3873743 
- .5278545 


Sigma_u | .37185442 
sigma_e | .40882825 
rho | -4527451 (fraction of variance due to u_i) 


test crsgpabar frstsembar seasonbar 


( 1) crsgpabar = 0 
( 2) frstsembar = 0 
( 3) seasonbar = 0 


chi2( 3) = 1.95 
Prob > chi2 = 0.5829 
test seasonbar 
( 1) seasonbar = 0 
chi2( 1) = 0.86 
Prob > chi2 = 0.3544 


* The traditional Hausman test that incorrectly includes the coefficients 
* on "spring" (the time dummy) among those being tested. 


qui xtreg trmgpa spring crsgpa frstsem season, fe 
estimates store fe 


qui xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize 
black female, re 


estimates store re 


hausman fe re 


---- Coefficients ---- 
| (b) (B) (b-B) sqrt(diag(V_b-V_B)) 
| fe re Difference S.E. 

ee ae se a ek ay ds +--------------------------------------------------------------- 
spring | -.0657817 - .0606536 - .0051281 .012291 
crsgpa | 1.140688 1.082365 -0583227 .0735758 
frstsem | 0128523 . 0029948 -0098575 . 0338223 
season | - .0566454 - .0440992 - .0125462 . 0134363 


b = consistent under Ho and Ha; obtained from xtreg 
B = inconsistent under Ha, efficient under Ho; obtained from xtreg 


Test: Ho: difference in coefficients not systematic 
chi2(4) = (b-B)’[(V_b-V_B)4(-1)](b-B) 
= 1.81 
Prob>chi2 = 0.7702 
qui xtreg trmgpa spring crsgpa frstsem season, fe 
estimates store fe 


qui xtreg trmgpa spring crsgpa frstsem season sat verbmath hsperc hssize black 
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estimates store re 
hausman fe re, sigmamore 


Note: the rank of the differenced variance matrix (3) does not equal the 
number of coefficients being tested (4); be sure 
this is what you expect, or there may be problems computing the test. 
Examine the output of your estimators for anything unexpected and 
possibly consider scaling your variables so that the coefficients are 
on a similar scale. 


---- Coefficients ---- 
| (b) (B) (b-B) sqrt(diag(V_b-V_B)) 
| fe re Difference SE; 

sl i a aea a ad +--------------------------------------------------------------- 
spring | - .0657817 - .0606536 - .0051281 .0121895 
crsgpa | 1.140688 1.082365 . 0583227 -0734205 
frstsem | 0128523 . 0029948 -0098575 -0337085 
season | - .0566454 - .0440992 - .0125462 013332 


b = consistent under Ho and Ha; obtained from xtreg 
B = inconsistent under Ha, efficient under Ho; obtained from xtreg 


Test: Ho: difference in coefficients not systematic 
chi2(3) = (b-B)’[(V_b-V_B)4(-1)](b-B) 
= 1.83 


Prob>chi2 = 0.6077 
(V_b-V_B is not positive definite) 


10.8. a. The Stata output is below. The coefficients on the lagged “clear-up” percentages 
are very close in magnitude. For example, if the first lag is 10 percentage points higher, the 
crime rate is estimated to fall by about 18.5 percent, a very large effect. The estimate of p in 
the AR(1) serial correlation test is .574 and t = 5.82, so there is very strong evidence of serial 


correlation. 
use norway 
xtset district year, delta(6) 
panel variable: district (strongly balanced) 
time variable: year, 72 to 78 
delta: 6 units 


reg lcrime d78 clrprc_1 clrprc_2 


Source | SS df MS Number of obs = 106 
ee rene F( 3, 102) = 30.27 
Model | 18.7948264 3 6.26494214 Prob > F = 0.0000 
Residual | 21.1114968 102 .206975459 R-squared = 0.4710 
-------------+------------------------------ Adj R-squared = 0.4554 
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Root MSE 


- .2421544 
- .0290149 
- .0281735 
3.808545 


1327051 
- .007976 
- .0066026 
4.553894 


Number of obs = 
51) = 


F( 1, 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


Total | 39.9063233 105 = .380060222 
lcrime | Coef Std. Err t 
d78 | -.0547246 .0944947 -0.58 
clrprc_1 | -.0184955 . 0053035 -3.49 
clrprc_2 | -.0173881 . 0054376 -3.20 
_cons | 4.18122 . 1878879 22.25 
predict vhat, resid 
gen vhat_1 = l.vhat 
(53 missing values generated) 
reg vhat vhat_1 
Source | SS df MS 
Model | 3.8092697 1 3.8092697 
Residual | 5.73894345 51 .112528303 
Total | 9.54821315 52 .183619484 
vhat | Coef Std. Err t 
vhat_1 | .5739582 .0986485 5.82 
_cons | -3.01e-09 .0460779 -0.00 


0.000 
1.000 


. 3759132 
- .0925053 


. 7720033 
.0925053 


b. The fixed effects estimates are given below. The coefficient on c/rprc_1 falls 


dramatically in magnitude, and becomes statistically insignificant. The coefficient on c/rprc_2 


falls somewhat but is still practically large and statistically significant. 


To obtain the heteroskedasticity-robust standard error for FE, we must use the FD 


estimation (which is the same as FE because 7 = 2) in order to make the calculation simple. 


Stock and Watson (2008, Econometrica) show that just applying the usual 


heteroskedasticity-robust standard error using pooled regression on the time-demeaned data 


does not produce valid standard errors. The reason is simple: as shown in the text, the time 


demeaning induces serial correlation in the errors. Of course, one can always use the fully 


robust standard errors, which allow for any kind of serial correlation in the original errors and 
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any kind of heteroskedasticity. In this example, obtaining the heteroskedasticity-robust 


standard errors has little effect on inference. 


xtreg lcrime d78 clrprc_1 clrprc_2, fe 


Fixed-effects (within) regression Number of obs = 106 
Group variable: district Number of groups = 53 
R-sq: within = 0.4209 Obs per group: min = 
between = 0.4798 avg = 2. 
overall = 0.4234 max = 
F(3,50) = 12.12 
corr(u_i, Xb) = 0.3645 Prob > F 7 0.0000 
lcrime | Coef Std. Err t P>|t | [95% Conf. Interval 
jt a yt a pi, a a Ft +--------------------------------------------------------------- 
d78 | -0856556 .0637825 1.34 0.185 - .0424553 .2137665 
clrprc_1 | -.0040475 .0047199 -0.86 0.395 - .0135276 .0054326 
clrprc_2 | -.0131966 .0051946 -2.54 0.014 - .0236302 - .0027629 
cons | 3.350995 . 2324736 14.41 0.000 2.884058 3.817932 
si es alae sr es i al +--------------------------------------------------------------- 
sigma_u | .47140473 
sigma_e | . 2436645 
rho | .78915666 (fraction of variance due to u_i) 
F test that all u_i=0: F(52, 50) = 5.88 Prob > F = 0.0000 
reg clcrime cclrprc_1 cclrprc_2, robust 
Linear regression Number of obs = 53 
F( 2, 50) = 4.78 
Prob > F = 0.0126 
R-squared = 0.1933 
Root MSE = 34459 
| Robust 
clcrime | Coef. Std. Err. t P>|t | [95% Conf. Interval 
He Sea a el, Fal, Se eta +--------------------------------------------------------------- 
cclrprc_1 | -.0040475 .0042659 -0.95 0.347 - .0126158 . 0045207 
cclrprc_2 | -.0131966 . 0047286 -2.79 0.007 - .0226942 - .003699 
_cons | .0856556 .0554876 1.54 0.129 - .0257945 .1971057 


c. I use the FD regression to easily allow for heteroskedasticity. The two-sided p-value is 
. 183. Because we do not reject Ho : Pı = p2 at even the 15% level, we might justify 
estimating a model with B; = p2 (and the pooled OLS results suggest it, too). The variable 


aveclr is the average of c/rprc_land clrprc_2, and so we can use it as the only explanatory 
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variable. Imposing the restriction gives a large estimated effect — a 10 percentage point 


increase in the average clear-up rate decreases crime by about 16.7% — and the 


heteroskedasticity-robust ¢ statistic is —2. 89. 


qui reg clcrime cclrprc_1 cclrprc_2, robust 
lincom cclrprc_1 - cclrprc_2 
( 1) cclrprce_1 - cclrprc_2 = 0 
clcrime | Coef Std. Err t P>|t | 
je i a i ac pe ea +---------------------------------------- 
(1) | . 009149 .0067729 1.35 0.183 
reg clcrime cavgclr, robust 
Linear regression 
Robust 
clcrime | Coef Std. Err t P>|t | 
Ti aaa eee +---------------------------------------- 
cavgclr | -.0166511 .0057529 -2.89 0.006 
_cons | . 0993289 .0554764 1.79 0.079 
reg clcrime cavgclr 
Source | SS df MS 
Model | 1.28607105 1 1.28607105 
Residual | 6.07411496 51 .119100293 
Total | 7.36018601 52 „141542039 
clcrime | Coef Std. Err t P>|t | 
ei ir, a a a Ste ee he: +---------------------------------------- 
cavgclr | -.0166511 . 0050672 -3.29 0.002 
_cons | . 0993289 . 0625916 1.59 0.119 


- .0044548 .0227529 
Number of obs = 53 
F( 4, 51) = 8.38 
Prob > F = 0.0056 
R-squared = 0.1747 
Root MSE = 34511 

[95% Conf. Interval 

- .0282006 - .0051016 

- .0120446 . 2107024 
Number of obs = 53 
F( 4, 51) = 10.80 
Prob > F = 0.0018 
R-squared = 0.1747 
Adj R-squared = 0.1586 
Root MSE = 34511 


- .0268239 
- .0263289 


- .0064783 
. 2249867 


10.9. a. The RE and FE estimates, with fully robust standard errors for each, are given 


below. The variable-addition Hausman test is obtained by adding time averages of all variables 
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except the year dummies; all other explanatory variables change across i and ¢. For 
comparison, the traditional nonrobust Hausman statistic is computed. This version uses the RE 
estimate of oĉ in estimating both the RE and FE asymptotic variances and properly computes 
the degrees of freedom (which is five for this application). 

While there are differences in the RE and FE estimates, the signs are the same and the 
magnitudes are similar. The fully robust Hausman test gives a strong statistical rejection of the 
RE assumption that county heterogeneity is uncorrelated with the criminal justice variables. 
Therefore, for magnitudes, we should prefer the FE estimates. (Remember, though, that both 
RE and FE maintain strict exogeneity conditional on the heterogeneity.) The nonrobust 
Hausman test gives a substantially larger statistic, 78.79 compared with 60.53, but the 


conclusion is the same. 


xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87, re 


cluster(county) 
Random-effects GLS regression Number of obs = 630 
Group variable: county Number of groups = 90 
R-sq: within = 0.4287 Obs per group: min = 
between = 0.4533 avg = Ta 
overall = 0.4454 max = 
Random effects u_i ~Gaussian Wald chi2(11) = 156.83 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 
(Std. Err. adjusted for 90 clusters in county 
Robust 
lcrmrte | Coef. Std. Err. Z P>|z | [95% Conf. Interval 
vale a a a a +--------------------------------------------------------------- 
lprbarr | -.4252097 .0629147 -6.76 0.000 - .5485202 - . 3018993 
lprbconv | -.3271464 .0499587 -6.55 0.000 - .4250636 - .2292292 
lprbpris | -.1793507 .0457547 -3.92 0.000 - .2690283 - .0896731 
lavgsen | -.0083696 .0322377 -0.26 0.795 -.0715543 .0548152 
lpolpc | - 4294148 .0878659 4.89 0.000 - 2572007 . 6016288 
d82 | .0137442 .0164857 0.83 0.404 - .0185671 . 0460556 
d83 | - .075388 . 0194832 -3.87 0.000 -.1135743 - .0372017 
d84 | -.1130975 .0217025 -5.21 0.000 - .1556335 - .0705614 
d85 | -.1057261 .0254587 -4.15 0.000 - .1556242 - .0558279 
d86 | -.0795307 .0239141 -3.33 0.001 - .1264014 - .0326599 
d87 | -.0424581 .0246408 -1.72 0.085 - .0907531 -005837 
_cons | -1.672632 .5678872 -2.95 0.003 -2.785671 - .5595939 


sigma_u | .30032934 
sigma_e | .13871215 
rho | .82418424 (fraction of variance 


due to u_i) 


xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87, fe 


cluster(county) 
Fixed-effects (within) regression Number of obs 630 
Group variable: county Number of groups = 90 
R-sq: within = 0.4342 Obs per group: min = 
between = 0.4066 avg = Ts 
overall = 0.4042 max = 
F(11, 89) = 11.49 
corr(u_i, Xb) = 0.2068 Prob > F = 0.0000 
(Std. Err. adjusted for 90 clusters in county 
| Robust 
lcrmrte | Coef. Std. Err t P>|t | [95% Conf. Interval 
ae sk, a, Ps E +--------------------------------------------------------------- 
lprbarr | -.3597944 .0594678 -6.05 0.000 -.4779557 . 2416332 
lprbconv | -.2858733 .051522 -5.55 0.000 - . 3882464 . 1835001 
lprbpris | -.1827812 .0452811 -4.04 0.000 -.2721538 .0928085 
lavgsen | -.0044879 . 0333499 -0.13 0.893 -.0707535 .0617777 
lpolpc | .4241142 .0849052 5.00 0.000 . 2554095 .592819 
d82 | .0125802 .0160066 0.79 0.434 - .0192246 .044385 
d83 | -.0792813 .0195639 -4.05 0.000 - .1181544 -0404081 
d84 | -.1177281 .0217118 -5.42 0.000 - .160869 .0745872 
d85 | -.1119561 .0256583 -4.36 0.000 - .1629386 .0609736 
d86 | -.0818268 .0236276 -3.46 0.001 -.1287745 .0348792 
d87 | -.0404704 .0241765 -1.67 0.098 - .0885087 .0075678 
cons | -1.604135 .5102062 -3.14 0.002 -2.617904 . 5903664 
Semi a sl te a N aa +--------------------------------------------------------------- 
sigma_u | .43487416 
sigma_e | .13871215 
rho | .90765322 (fraction of variance due to u_i) 


egen lprbatb = mean(lprbarr), by(county) 


egen lprbctb = mean(lprbconv), by(county) 


egen lprbptb = mean(lprbpris), by(county) 


egen lavgtb = mean(lavgsen), by(county) 


egen lpoltb = mean(lpolpc), by(county) 


qui xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87 Ilprbatb 
lprbctb lprbptb lavgtb lpoltb, re cluster(county) 


test Ilprbatb lprbctb lprbptb lavgtb lpoltb 


( 1) Ilprbatb = 0 
( 2) Ilprbctb = 0 
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3) lprbptb = 0 


( 
( 4) lavgtb = 0 
( 5) lpoltb = 0 
chi2( 5) = 60.53 
Prob > chi2 = 0.0000 


qui xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87, fe 

estimates store fe 

qui xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87, re 

estimates store re 

hausman fe re, sigmamore 

Note: the rank of the differenced variance matrix (3) does not equal the 

number of coefficients being tested (4); be sure 
this is what you expect, or there may be problems computing the test. 
Examine the output of your estimators for anything unexpected and 


possibly consider scaling your variables so that the coefficients are 
on a similar scale. 


---- Coefficients ---- 
| (b) (B) (b-B) sqrt(diag(V_b-V_B)) 
| fe re Difference S.E. 

Hs fal Sa e Sm! pl a +--------------------------------------------------------------- 
lprbarr | - .3597944 - .4252097 -0654153 .0133827 
lprbconv | - .2858733 - .3271464 .0412731 - 0084853 
lprbpris | - .1827812 -.1793507 - .0034305 . 0065028 
lavgsen | - .0044879 - .0083696 . 0038816 -0037031 
lpolpc | .4241142 .4294148 - .0053005 .0103217 
d82 | .0125802 .0137442 - .001164 .0010763 
d83 | - .0792813 - .075388 - .0038933 . 0008668 
d84 | -.1177281 -.1130975 - .0046306 .0013163 
d85 | -.1119561 -.1057261 - .00623 . 0014304 
d86 | - .0818268 - .0795307 - .0022962 .0007719 
d87 | - .0404704 - .0424581 .0019876 .001219 


b = consistent under Ho and Ha; obtained from xtreg 
B = inconsistent under Ha, efficient under Ho; obtained from xtreg 


Test: Ho: difference in coefficients not systematic 


chi2(5) = (b-B)’ [(V_b-V_B)4(-1)](b-B) 
= 78.79 
Prob>chi2 = 0.0000 
(V_b-V_B is not positive definite) 


b. Below is the Stata output for fixed effects with the nine wage variables; the inference is 
fully robust. The log wage variables are jointly significant at the 1.2% significance level. 


Unfortunately, one of the most significant wage variables, /wtuc, has a positive, statistically 
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significant coefficient. It is difficult to give that a causal interpretation. The coefficient on the 
manufacturing wage implies that a ceteris paribus 10% increase in manufacturing wage 


reduces the crime rate by about three percent. 


xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87 Ilwcon lwtuc 
lwtrd lwfir lwser lwmfg lwfed lwsta lwloc, fe cluster(county) 


Fixed-effects (within) regression Number of obs = 630 
Group variable: county Number of groups = 90 
R-sq: within = 0.4575 Obs per group: min = 
between = 0.2518 avg = 7. 
overall = 0.2687 max = 
F(20,89) = 13.39 
corr(u_i, Xb) = 0.0804 Prob > F = 0.0000 


(Std. Err. adjusted for 90 clusters in county 


Robust 
lcrmrte | Coef. Std. Err. t P>|t | [95% Conf. Interval 
sins ca la at «es +--------------------------------------------------------------- 
lprbarr | -.3563515 .0615202 -5.79 0.000 - .4785909 - .2341121 
lprbconv | -.2859539 .0507647 -5.63 0.000 - . 3868224 - . 1850855 
lprbpris | -.1751355 .0457628 -3.83 0.000 - . 2660652 - .0842058 
lavgsen | -.0028739 . 0333994 -0.09 0.932 - .0692379 .06349 
lpolpc | .4229 .0822172 5.14 0.000 . 2595362 . 5862639 
d82 | .0188915 .0221161 0.85 0.395 - .0250527 . 0628357 
d83 | - .055286 . 0306016 -1.81 0.074 - .1160906 .0055187 
d84 | -.0615162 . 0406029 -1.52 0.133 - .1421934 .0191609 
d85 | -.0397115 . 0603405 -0.66 0.512 - .1596068 .0801837 
d86 | -.0001133 .0720231 -0.00 0.999 - .1432217 . 1429952 
d87 | .0537042 .0847749 0.63 0.528 -.1147418 .2221501 
lwcon | -.0345448 .0253345 -1.36 0.176 - .0848839 .0157943 
lwtuc | .0459747 .0161107 2.85 0.005 .013963 .07 79864 
lwtrd | -.0201766 .0313131 -0.64 0.521 - .0823951 .0420418 
lwfir | -.0035445 .0130093 -0.27 0.786 - .0293937 0223047 
lwser | .0101264 .0202248 0.50 0.618 - .0300599 . 0503128 
lwmfg | -.3005691 . 1063746 -2.83 0.006 - .5119331 - .089205 
lwfed | -.3331226 2245785 -1.48 0.142 - . 7793554 1131101 
lwsta | 0215209 1051755 0.20 0.838 -.1874606 2305023 
lwloc | 1810215 1629903 1.11 0.270 - .1428368 5048797 
cons | 8931726 1.479937 0.60 0.548 -2.04743 3.833775 
aniar ees a Se” a ell et ca es +--------------------------------------------------------------- 
sigma_u | .47756823 
sigma_e | .13700505 
rho | .92395784 (fraction of variance due to u_i) 


testparm lwcon-lwloc 


( 1) lwcon = 0 
( 2) Ilwtuc = 0 
( 3) Ilwtrd = 0 
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( 4) Ilwfir =0 
( 5) Ilwser = 0 
( 6) Ilwmfg = 0 
( 7) lwfed = 0 
( 8) Ilwsta = 0 
( 9) Ilwloc = 0 
F( 9, 89) = 2.54 
Prob > F = 0.0121 


c. First, we need to compute the changes in log wages. Then, we just use pooled OLS. 
Rather than difference the year dummies we just include dummies for 1983 through 1987. 
Both the usual and full robust standard errors are computed and compared with those from FE. 

The nonrobust FD and FE standard errors are similar, and often very different from the 
comparable robust standard errors. In fact, in many cases the robust standard errors are double 
or more than the nonrobust ones, although some nonrobust ones are actually smaller. The wage 
variables generally have much smaller coefficients when FD is used, but they are still jointly 


significant using a robust test. 


. gen clwcon = lwcon - lwcon[_n-1] if year > 81 
90 missing values generated 
g g 


. gen clwtuc = lwtuc - lwtuc[_n-1] if year > 81 
missing values generate 
90 missing 1 g ted 


. gen clwtrd = lwtrd - lwtrd[_n-1] if year > 81 
(90 missing values generated) 


. gen clwfir = lwfir - lwfir[_n-1] if year > 81 
(90 missing values generated) 


. gen clwser = lwser - lwser[_n-1] if year > 81 
90 missing values generated 
g g 


. gen clwmfg = lwmfg - lwmfg[_n-1] if year > 81 
(90 missing values generated) 


. gen clwfed = lwfed - lwfed[_n-1] if year > 81 
(90 missing values generated) 


. gen clwsta = lwsta - lwsta[_n-1] if year > 81 
(90 missing values generated) 


. gen clwloc = lwloc - lwloc[_n-1] if year > 81 
(90 missing values generated) 
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reg clcrmrte clprbarr clprbcon clprbpri clavgsen clpolpc clwcon-clwloc 


d83-d87 


Source 


Model 


9.86742162 
12.3293822 


22.1968038 


clprbarr 
clprbcon 
clprbpri 
clavgsen 
clpolpc 
clwcon 
clwtuc 
clwtrd 
clwfir 
clwser 
clwmfg 
clwfed 
clwsta 
clwloc 
d83 


- . 3230993 
- . 2402885 
- . 1693859 
- .0156167 
»3977221 
- .0442368 
.0253997 
- .0290309 
.009122 
.0219549 
- .1402482 
0174221 
- .0517891 
- .0305153 
- .1108653 
- .0374103 
- .0005856 
.0314757 
. 0388632 


- . 2641248 
- . 2044407 
-.117974 
.0284136 
. 450739 
.015513 
. 0533144 
.0314586 
. 0508326 
.0503113 
.0600003 
. 3545493 
. 1362385 
.1700694 
- .0581951 
.0107856 
.0467164 
.0796262 
.0875482 


df MS Number of obs = 
F( 19, 520) 
19 . 51933798 Prob > F 
520 .02371035 R-squared 
Adj R-squared 
539 .041181454 Root MSE 
Std. Err t P>|t | [95% Conf. 
.0300195 -10.76 0.000 - .3820737 
.0182474 -13.17 0.000 - .2761362 
.02617 -6.47 0.000 -.2207978 
.0224126 -0.70 0.486 - .0596469 
. 026987 14.74 0.000 . 3447051 
.0304142 -1.45 0.146 - .1039865 
.0142093 1.79 0.074 - .002515 
.0307907 -0.94 0.346 - .0895203 
.0212318 0.43 0.668 - .0325886 
. 0144342 1.52 0.129 - .0064016 
. 1019317 -1.38 0.169 - .3404967 
.1716065 0.10 0.919 - .319705 
.0957109 -0.54 0.589 - .2398166 
. 1021028 -0.30 0.765 -.2311 
.0268105 -4.14 0.000 - .1635355 
.024533 -1.52 0.128 - .0856063 
.024078 -0.02 0.981 - .0478877 
.0245099 1.28 0.200 - .0166749 
.0247819 1.57 0.117 - .0098218 
.0206974 0.96 0.338 - .0208086 


.0198522 


.060513 


reg clcrmrte clprbarr clprbcon clprbpri clavgsen clpolpc clwcon-clwloc 
d83-d87, cluster(county) 


Linear regression 


clprbarr 
clprbcon 
clprbpri 
clavgsen 
clpolpc 
clwcon 
clwtuc 
clwtrd 
clwfir 
clwser 


Robust 


Std. Err. 


Number of obs 
F( 19, 89) 
Prob > F 
R-squared 
Root MSE 


= 540 
= 13.66 
0.0000 
= 0.4445 
= .15398 


adjusted for 90 clusters in county 


- . 3230993 
- . 2402885 
- . 1693859 
- .0156167 
3977221 
- .0442368 
.0253997 
- .0290309 
.009122 
.0219549 


.0584771 
. 0403223 
.0459288 
.0267541 
. 1038642 
.0165835 
.0123845 
.0180398 
.006921 
.0180754 
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t P>|t | [95% Conf. 
53 0.000 - .4392919 
96 0.000 - .320408 
69 0.000 - .2606455 
58 0.561 - .0687765 
83 0.000 .1913461 
67 0.009 - .0771879 
05 0.043 . 000792 
61 0.111 - .0648755 
.32 0.191 - .0046299 
21 0.228 - .0139606 


- . 2069066 
- .1601689 
- .0781263 
.0375432 
. 604098 
- .0112856 
.0500075 
. 0068138 
0228739 
.0578703 


clwmfg | -.1402482 1190279 -1.18 0.242 - .3767541 .0962578 
clwfed | .0174221 1326 0.13 0.896 - .2460511 . 2808954 
clwsta | -.0517891 .0674058 -0.77 0.444 - . 185723 .0821449 
clwloc | -.0305153 .1269012 -0.24 0.811 - .2826652 . 2216346 
d83 | -.1108653 .0270368 -4.10 0.000 - . 1645868 - .0571437 
d84 | -.0374103 .0237018 -1.58 0.118 - .0845052 . 0096845 
d85 | -.0005856 .0256369 -0.02 0.982 - .0515257 . 0503544 
d86 | -0314757 .0214193 1.47 0.145 - .011084 . 0740353 
d87 | . 0388632 . 0263357 1.48 0.144 - .0134653 .0911917 
cons | .0198522 .0180545 1.10 0.274 - .0160217 .0557261 


( 1) clwcon = 0 
( 2) clwtuc = 0 
( 3) clwtrd = 0 
( 4) clwfir = 0 
( 5) clwser = 0 
( 6) clwmfg = 0 
( 7) clwfed = 0 
( 8) clwsta = 0 
( 9) clwloc = 0 
F( 9, 89) = 2.38 
Prob > F = 0.0184 


xtreg lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc d82-d87 Ilwcon lwtuc 
lwtrd lwfir lwser lwmfg lwfed lwsta lwloc, fe 


Fixed-effects (within) regression Number of obs = 630 
Group variable: county Number of groups = 90 
R-sq: within = 0.4575 Obs per group: min = 
between = 0.2518 avg = Ta 
overall = 0.2687 max = 

F(20,520) = 21.92 

corr(u_i, Xb) = 0.0804 Prob > F = 0.0000 
lcrmrte | Coef Std. Err t P>|t | [95% Conf. Interval 
Ht fa a + a i me! pe i Sy +--------------------------------------------------------------- 

lprbarr | -.3563515 .0321591 -11.08 0.000 - .4195292 - .2931738 

lprbconv | -.2859539 .0210513 -13.58 0.000 - .3273099 - .2445979 

lprbpris | -.1751355 . 0323403 -5.42 0.000 - . 2386693 -.1116017 

lavgsen | -.0028739 .0262108 -0.11 0.913 - .054366 .0486181 

lpolpc | 4229 .0263942 16.02 0.000 . 3710476 -4747524 

d82 | .0188915 .0251244 0.75 0.452 - .0304662 . 0682492 

d83 | - .055286 0330287 -1.67 0.095 -.1201721 . 0096001 

d84 | -.0615162 .0410805 -1.50 0.135 - .1422204 .0191879 

d85 | -.0397115 .0561635 -0.71 0.480 - .1500468 .0706237 

d86 | -.0001133 .0680124 -0.00 0.999 - .1337262 . 1334996 

d87 | -0537042 .0798953 0.67 0.502 - .1032532 .2106615 

lwcon | -.0345448 .0391616 -0.88 0.378 -.1114792 .0423896 

lwtuc | -0459747 .019034 2.42 0.016 .0085817 .0833677 

lwtrd | -.0201766 .0406073 -0.50 0.619 - .0999511 .0595979 

lwfir | -.0035445 .028333 -0.13 0.900 - .0592058 .0521168 

lwser | .0101264 .0191915 0.53 0.598 -.027576 .0478289 
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lwmfg | -.3005691 . 1094068 -2.75 0.006 - .5155028 - .0856354 
lwfed | -.3331226 .176448 -1.89 0.060 -.6797612 .013516 
lwsta | .0215209 . 1130648 0.19 0.849 - . 2005991 . 2436409 
lwloc | .1810215 . 1180643 1.53 0.126 - .0509203 4129632 
cons | .8931726 1.424067 0.63 0.531 -1.90446 3.690805 
Gia ee ee +--------------------------------------------------------------- 
sigma_u | .47756823 
sigma_e | .13700505 
rho | .92395784 (fraction of variance due to u_i) 
F test that all u_i=0: F(89, 520) = 39.12 Prob > F = 0.0000 


d. There is strong evidence of negative serial correlation in the FD equation, suggesting 


that if the idiosyncratic errors follow an AR(1) process, the coefficient is less than unity. 


qui reg clcrmrte clprbarr clprbcon clprbpri clavgsen clpolpc clwcon-clwloc 
d83-d87 


predict ehat, resid 
(90 missing values generated) 


gen ehat_1 = l.ehat 
(180 missing values generated) 


reg ehat ehat_1 


Source | SS df MS Number of obs = 450 
ee rrr een F( 1, 448) = 21.29 
Model | .490534556 1 .490534556 Prob > F = 0.0000 
Residual | 10.3219221 448 .023040005 R-squared = 0.0454 
-------------+------------------------------ Adj R-squared = 0.0432 
Total | 10.8124566 449 .024081195 Root MSE = ,.15179 

ehat | Coef Std. Err t P>|t | [95% Conf. Interval 

sen ak i, i a a al es te i oe +--------------------------------------------------------------- 
ehat_1 | -.222258 .0481686 -4.61 0.000 - .3169225 -.1275936 
_cons | 5.97e-10 .0071554 0.00 1.000 - .0140624 .0140624 


10.10. a. To allow for different intercepts in the original model we can include a year 
dummy for 1993 in the FD equation. (The three years of data are 1987, 1990, and 1993.) There 
is no evidence of serial correlation in the FD errors, ej = Uit — ujy1, as the coefficient on ĉ;m1 
is puny and so is its ¢ statistic. It appears that a random walk for {ux : t = 1,2,3 isa 


reasonably characterization, although concluding this with T = 3 is tenuous. 
use murder 


xtset id year 
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panel variable: 
time variable: 


delta: 


reg cmrdrte d93 cexec cunem 


Source 


Model 


| 46.7620386 
| 1812.28688 


| 1859.04892 


id (strongly balanced) 
year, 87 to 93, but with gaps 
1 unit 


Number of obs = 
98) = 


| 
= 
© 
N 


| -1.296717 
| -.1150682 
| .1630854 
| 1.51099 


F( 3, 
Prob > F 
R-squared 
Adj R-squared = 
Root MSE 

[95% Conf. 

-3.313171 

- .407553 

- .447942 

. 1994622 


. 7197368 
.1774167 
. 7741127 
2.822518 


predict ehat, 


resid 


(51 missing values generated) 


gen ehat_1 


= l.ehat 


(102 missing values generated) 


reg ehat ehat_1 


Source 


Model 


| .075953071 
| 58.3045094 


| 58.3804625 


df MS 

3 15.5873462 

98 18.4927233 
101 18.406425 
Std. Err t 
1.016118 -1.28 
.1473871 -0.78 
. 3079049 0.53 
.6608967 2.29 

df MS 


1 .075953071 
49 1.18988795 


50 1.16760925 


Number of obs 
F( 1, 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


49) = 


= 51 


| 0065807 
| -9.10e-10 


.0260465 
.1527453 


0.25 
-0.00 


0.802 
1.000 


- .0457618 
- . 3069532 


.0589231 
. 3069532 


b. To make all of the FE and FD estimates comparable, the year dummies are differenced 


along with the other variables in the FD estimation, and no constant is included. (The 


R-squared for the FD equation is computed using the usual total sum of squares, but the FE and 


FD R-squareds are not directly comparable.) The FE and FD coefficient estimates are similar 


but, especially for the execution variable, the FD standard error is much smaller. Because these 


are fully robust it is sensible to compare them. Because we found no serial correlation in the 


175 


FD errors, it makes sense that the FD estimator is more efficient than FE (whose idiosyncratic 


errors appear to follow a random walk). 


gen cd90 = 


(51 missing values generated) 


gen cd93 = 


(51 missing values generated) 


reg cmrdrte cd90 cd93 cexec cunem, 


Linear regression 


d90 - d90[_n-1] if year > 87 


d93 - d93[_n-1] if year > 87 


Number of obs 


nocons tsscons cluster(id) 


F( 4, 

Prob > F 
R-square 
Root MSE 


(Std. Err. 


50) 


d 


= 102 
= 2.95 
= 0.0291 
= 0.0252 
= 4.3003 


adjusted for 51 clusters in id 


Std. Err. 


1.725264 
-. 1150682 


1.056408 
. 8603626 
.0386021 
. 2998749 


- .6108675 
- .0028256 
- .1926027 


- .439231 


3.632848 
3.453353 
- .0375337 
. 7654018 


xtreg mrdrte d90 d93 exec unem, fe cluster(id) 


Fixed-effects (within) regression 
Group variable: id 


R-sq: 


within 
between 
overall 


corr(u_i, Xb) 


sigma_u | 
sigma_e | 
rho | 


0.0734 
0.0037 
0.0108 


0.0010 


(Std. Err. 


Number of obs 
Number of groups 


Obs per group: 


F(4,50) 


Prob > F 


avg 


max = 


min = 


1.80 
0.1443 


adjusted for 51 clusters in id 


1.556215 
1.733242 
- .1383231 
. 2213158 
5.822104 


1.119004 
. 8685105 
.0805733 

. 374899 
2.814864 


- .6913706 
- .0112126 
- . 3001593 
- .5316909 

. 1682823 


8.7527226 
3.5214244 
. 86068589 
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c. The explanatory variable exec; might fail strict exogeneity if states increase future 
executions in response to current positive shocks to the murder rate. Given the relatively short 
stretch of time, feedback from murder rates to future executions may not be much of a concern, 
as the judicial process in capital cases tends to move slowly. (Of course, if it were sped up 
because of an increase in murder rates, that could violate strict exogeneity.) With a longer time 
series we could add exec;,.1 (and even values from further in the future) and estimate the 
equation by FE or FD, testing exec;,,1 for statistical significance. 

10.11. a. The key coefficient is B,. Because AFDC participation gives women access to 
better nutrition and prenatal care, we hope that AFDC participation causes the percent of 
low-weight births to fall. This only makes sense witih a ceteris paribus thought experiment, 
holding fixed economic and other variables, such as demographic variables and quality of 
other kinds of health care. A reasonable expectation is B, < 0: more physicians means 
relatively fewer low-weight births. The variable bedspc is another proxy for health-care 
availability, and we expect B, < 0. Higher per capita income should lead to lower /owbrth, too 
(B, < 0). The effect of population on a per capita variable is ambiguous, especially because it 
is total population and not population density. 

b. The Stata output follows. Both the usual and fully robust standard errors are computed. 
The standard errors robust to serial correlation (and heteroskedasticity) are, as expected, 
somewhat larger. (If you test for AR(1) serial correlation in the composite error, v; it is very 
strong. In fact, the estimated p is slightly above one). Only the per capita income variable is 
statistically significant. The estimate implies that a 10 percent rise in per capita income is 


associated with a .25 percentage point fall in the percent of low-weight births. 


reg lowbrth d90 afdcprc lphypc lbedspc lpcinc lpopul 
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Source 


Model 


33.7710894 
100.834005 


Number of obs = 
93) 


d90 
afdcprc 
lphypc 


.5797136 
.0955932 
. 3080648 
. 2790041 
-2.494685 
. 739284 
26.57786 


6 5.6285149 
93 1.08423661 
99 1.35964742 
Std. Err t 
. 2761244 2.10 
.0921802 1.04 
. 71546 0.43 
.5130275 0.54 
.9783021 -2.55 
. 7023191 1.05 
7.158022 3.71 


F( 6, 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 
[95% Conf. 
. 0313853 
- .0874584 
-1.112697 
- . 7397668 
-4.4374 
- .6553826 
12.36344 


1.128042 
. 2786448 
1.728827 
1.297775 
-.5519711 
2.133951 
40.79227 


reg lowbrth d90 afdcprc 


Linear regression 


d90 


lphypc lbedspc lpcinc lpopul, cluster(state) 


Number of obs = 
49) = 


F( 6, 
Prob > F 
R-squared 
Root MSE 


| 
= 
© 
© 


(Std. Err. adjusted for 50 clusters in state 


Std. Err. 


.5797136 
.0955932 
. 3080648 
. 2790041 
-2.494685 
. 739284 
26.57786 


.2214303 
.1199883 
. 9063342 
. 7853754 
1.203901 
. 9041915 
9.29106 


. 1347327 
- . 1455324 
.513282 
. 299267 
.914014 
-077757 
7.906773 


1.024694 
. 3367188 
2.129411 
1.857275 
- .0753567 
2.556325 
45.24894 


c. The FD (equivalently, FE) estimates are given below. The heteroskedasticity-robust 


standard error for the AFDC variable is actually smaller. In any case, removing the state 


unobserved effect changes the sign on the AFDC participation variable, and it is marginally 


statistically significant. Oddly, physicians-per-capita now has a positive, significant effect on 


percent of low-weight births. The hospital beds-per-capita variable has the expected negative 


effect. 
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reg clowbrth 


F( 5, 


cafdcprc clphypc clbedspc clpcinc clpopul 


Number of obs 
44) 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


Source | SS df MS 
Model | .861531934 5 „172306387 
Residual | 3.00026764 44 .068187901 
Total | 3.86179958 49 .078812236 
clowbrth | Coef Std. Err t 
St a ag a pt Fg Jp ea + 
cafdcprc | -.1760763 .0903733 -1.95 
clphype | 5.894509 2.816689 2.09 
clbedspc | -1.576195 -8852111 -1.78 
clpcinc | -.8455268 1.356773 -0.62 
clpopul | 3.441116 2.872175 1.20 
_cons | . 1060158 . 3090664 0.34 


- .3582116 

2178452 
-3.360221 
-3.579924 
-2.347372 
- .5168667 


. 006059 
11.57117 
. 2078308 
1.88887 
9.229604 
. 7288983 


reg clowbrth 


Linear regression 


cafdcprc clphypc clbedspc clpcinc clpopul, 


robust 


Number of obs 
F( 5, 44) 
Prob > F 
R-squared 
Root MSE 


clowbrth | Coef 
ar, eS Se ae! a. Se: ne + 

cafdcprce | -.1760763 
clphypc | 5.894509 
clbedspc | -1.576195 
clpcinc | -.8455268 
clpopul | 3.441116 

_cons | . 1060158 


Std. Err. 


- .3307695 
- . 3504018 
-4.067567 

-3.8364 
-1.975596 
- .6347664 


d. Adding a quadratic in afdcprc yields a diminishing impact of AFDC participation. The 


turning point in the quadratic is at about afdcprc = 6.4, and only four states have AFDC 


participation rates above 6.4 percent. So, the largest effect is at low AFDC participation rates 


and the effect is negative until afdcprc = 6.4. It is not clear this makes sense: if AFDC 


participation increases then more women in living in poverty get better prenatal care for their 


children. But the quadratic is not statistically significant at the usual levels and we could safely 


drop it. 
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reg clowbrth cafdcpre cafdcpsq clphypc clbedspc clpcinc clpopul, robust 


Linear regression Number of obs = 50 
F( 6, 43) = 2.07 
Prob > F = 0.0762 
R-squared = 0.2499 
Root MSE = .25956 
Robust 
clowbrth | Coef Std. Err. t P>|t | [95% Conf. Interval 
fsa tg pi a gh Fags T +--------------------------------------------------------------- 
cafdcpre | -.5035049 .2612029 -1.93 0.061 -1.030271 .023261 
cafdcpsq | . 0396094 0317531 1.25 0.219 - .0244267 . 1036456 
clphypc | 6.620885 3.448026 1.92 0.061 - .332723 13.57449 
clbedspc | -1.407963 1.344117 -1.05 0.301 -4.118634 1.302707 
clpcinc | -.9987865 1.541609 -0.65 0.521 -4.107738 2.110165 
clpopul | 4.429026 2.925156 1.51 0.137 -1.470113 10.32817 
_cons | .1245915 . 386679 0.32 0.749 -.655221 . 9044041 
. di abs(_b[cafdcprc]/(2*_b[cafdcpsq])) 
6. 3558685 
sum afdcprc if d90 
Variable | Obs Mean Std. Dev. Min Max 
EEE EA +-------------------------------------------------------- 
afdcprc | 50 4.162976 1.317277 1.688183 7.358795 
count if afdcprc >= 6.4 & d90 
4 
10.12. a. Even if c; is uncorrelated with x; for all t, the usual OLS standard errors do not 
account for the serial correlation in v; = c; + ui. You can see that the fully robust standard 
errors are substantially larger than the usual ones, in some cases more than double. 
use wagepan 
reg lwage educ black hisp exper expersq married union d81-d87, cluster(nr) 
Linear regression Number of obs = 4360 
F( 14, 544) = 47.10 
Prob > F = 0.0000 
R-squared = 0.1893 
Root MSE = .48033 


(Std. Err. adjusted for 545 clusters in nr 


Robust 
Std. Err. t P>|t | [95% Conf. Interval 


180 


educ | . 0913498 .0110822 8.24 0.000 . 0695807 . 1131189 
black | -.1392342 . 0505238 -2.76 0.006 - .2384798 - .0399887 
hisp | .0160195 .0390781 0.41 0.682 - .060743 .092782 
exper | -0672345 .0195958 3.43 0.001 .0287417 . 1057273 
expersq | -.0024117 . 0010252 -2.35 0.019 - .0044255 - .0003979 
married | . 1082529 . 026034 4.16 0.000 .0571135 . 1593924 
union | . 1824613 .0274435 6.65 0.000 . 1285531 . 2363695 
d81 | .05832 .028228 2.07 0.039 .0028707 . 1137693 
d82 | .0627744 .0369735 1.70 0.090 - .0098538 . 1354027 
d83 | .0620117 . 046248 1.34 0.181 - .0288348 . 1528583 
d84 | .0904672 .057988 1.56 0.119 - .0234407 . 204375 
d85 | . 1092463 . 0668474 1.63 0.103 - .0220644 . 240557 
d86 | . 1419596 .0762348 1.86 0.063 - .007791 . 2917102 
d87 | . 1738334 . 0852056 2.04 0.042 .0064611 . 3412057 
cons | .0920558 . 1609365 0.57 0.568 -.2240773 . 4081888 
reg lwage educ black hisp exper expersq married union d81-d87 

Source | SS df MS Number of obs = 4360 
Pasi tes pate e eos Se eee Seo eer eee eS F( 14, 4345) = 72.46 
Model | 234.048277 14 16.7177341 Prob > F = 0.0000 
Residual | 1002.48136 4345 .230720682 R-squared = 0.1893 
-------------+------------------------------ Adj R-squared = 0.1867 
Total | 1236.52964 4359 .283672779 Root MSE 48033 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

Ht far Sa sme pt, a dae Ss +--------------------------------------------------------------- 
educ | .0913498 .0052374 17.44 0.000 .0810819 . 1016177 
black | -.1392342 .0235796 -5.90 0.000 - .1854622 - .0930062 
hisp | .0160195 .0207971 0.77 0.441 - .0247535 .0567925 
exper | .0672345 .0136948 4.91 0.000 . 0403856 . 0940834 
expersq | -.0024117 . 00082 -2.94 0.003 - .0040192 - .0008042 
married | . 1082529 .0156894 6.90 0.000 .0774937 . 1390122 
union | . 1824613 .0171568 10.63 0.000 . 1488253 . 2160973 
d81 | .05832 . 0303536 1.92 0.055 - .0011886 .1178286 
d82 | .0627744 .0332141 1.89 0.059 - .0023421 . 1278909 
d83 | .0620117 . 0366601 1.69 0.091 - .0098608 . 1338843 
d84 | .0904672 . 0400907 2.26 0.024 .011869 . 1690654 
d85 | . 1092463 .0433525 2.52 0.012 .0242533 . 1942393 
d86 | . 1419596 . 046423 3.06 0.002 . 0509469 2329723 
d87 | . 1738334 .049433 3.52 0.000 .0769194 .2707474 
cons | .0920558 .0782701 1.18 0.240 - .0613935 - 2455051 


b. The random effects estimates on the time-constant variables are similar to the pooled 
OLS estimates. The coefficients on the quadratic in experience for RE show an initially 
stronger effect of experience, but with the slope diminishing more rapidly. There are important 
differences in the variables that chance across individual and time; they are notably lower for 


random effects. The random effects marriage premium is about 6.4%, while the pooled OLS 
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estimate is about 10.8%. For union status, the random effects estimate is 10.6% compared with 
a pooled OLS estimate of 18.2%. 

Note that the RE standard errors for the coefficients on the time-constant explanatory 
variables are similar to the fully robust POLS standard errors. However, the RE standard errors 
for married and union are substantially smaller than the robust POLS standard errors, 
suggestive of the relative efficiency of RE. To be fair, we should compute the fully robust 
standard errors for RE. As shown below, these are somewhat larger than the usual RE standard 
errors, but for the married and union still not nearly as large as the robust standard errors for 
POLS. An important conclusion is that, even though RE might not be the asymptotically 
efficient FGLS estimator, it appears to be more efficient than POLS, at least for the 


time-varying explanatory variables. 


xtreg lwage educ black hisp exper expersq married union d81-d87, re 


Random-effects GLS regression Number of obs = 4360 
Group variable: nr Number of groups = 545 
R-sq: within = 0.1799 Obs per group: min = 
between = 0.1860 avg = 8. 
overall = 0.1830 max = 
Random effects u_i ~Gaussian Wald chi2(14) = 957.77 
corr(u_i, X) = 0 (assumed) Prob > chi2 0.0000 


lwage | Coef Std. Err Z P>|z | [95% Conf. Interval 
Slee eb et ee ee ee +--------------------------------------------------------------- 
educ | .0918763 .0106597 8.62 0.000 .0709836 .1127689 
black | -.1393767 .0477228 -2.92 0.003 .2329117 .0458417 
hisp | .0217317 0426063 0.51 0.610 .0617751 . 1052385 
exper | . 1057545 .0153668 6.88 0.000 .0756361 .1358729 
expersq | -.0047239 . 0006895 -6.85 0.000 .0060753 .0033726 
married | .063986 .0167742 3.81 0.000 .0311091 .0968629 
union | . 1061344 .0178539 5.94 0.000 .0711415 . 1411273 
d81 | .040462 .0246946 1.64 0.101 .0079385 .0888626 
d82 | .0309212 . 0323416 0.96 0.339 .0324672 .0943096 
d83 | .0202806 .041582 0.49 0.626 .0612186 .1017798 
d84 | . 0431187 .0513163 0.84 0.401 .0574595 . 1436969 
d85 | .0578155 . 0612323 0.94 0.345 .0621977 . 1778286 
d86 | .0919476 .0712293 1.29 0.197 .0476592 . 2315544 
d87 | . 1349289 .0813135 1.66 0.097 .0244427 . 2943005 
_cons | . 0235864 . 1506683 0.16 0.876 -.271718 . 3188907 
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sigma_u 
sigma_e 
rho 


. 32460315 
. 35099001 
. 46100216 


(fraction of variance due to u_i) 


xtreg lwage educ black hisp exper expersq married union d81-d87, re 
cluster(nr) 


Random-effects 
Group variable: nr 


R-sq: within 


between 


overall = 


Random effects 


corr(u_i, X) 


GLS regressio 


0.1799 
= 0.1860 
0.1830 


u_i ~Gaussian 


n 


= 0 (assumed) 


Number of obs = 4360 
Number of groups = 545 
Obs per group: min = 
avg = 8 
max = 
Wald chi2(14) = 610.97 
Prob > chi2 = 0.0000 


adjusted for 545 clusters in nr 


expersq 
married 
union 
d81 

d82 


-0918763 
- .1393767 
-0217317 
. 1057545 
- .0047239 
. 063986 
. 1061344 
.040462 
.0309212 
. 0202806 
.0431187 


(Std. Err 
Robust 
Std. Err Z 
.0111455 8.24 
.0509251 -2.74 
.0399157 0.54 
.016379 6.46 
.0007917 -5.97 
.0189722 3.37 
. 020844 5.09 
. 0275684 1.47 
.0350705 0.88 
. 043861 0.46 
.0555848 0.78 
0645584 0.90 
0747028 1.23 
0848618 1.59 
1599577 0.15 


P>|z| [95% Conf. Interval 

0.000 .0700315 .1137211 
0.006 - .2391882 - .0395653 
0.586 - .0565015 .099965 
0.000 .0736522 . 1378568 
0.000 - .0062756 - .0031723 
0.001 . 0268013 . 1011708 
0.000 .065281 . 1469879 
0.142 - .0135711 .0944951 
0.378 - .0378158 .0996581 
0.644 - .0656853 . 1062466 
0.438 - .0658254 . 1520628 
0.370 - .0687167 1843476 
0.218 - .0544671 2383623 
0.112 - .0313971 3012549 
0.883 - .289925 3370977 


. 32460315 
. 35099001 
. 46100216 


c. The variable exper ,, is redundant because everyone in the sample works every year, so 


EXPEL i 144 


= exper,, + 1,t = 1,...,7, for all i. The effects of the initial levels of experience, 


exper ii, cannot be distinguished from c; because we are allowing exper; to be correlated with 


c;. Then, because each experience variable follows the same linear time trend, the effects 


cannot be separated from the aggregate time effects (year dummies). 
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TheFE estimates follow. The marriage and union premiums fall even more, although both 


are still statistically significant and economically relevant. The fully robust standard errors are 


somewhat larger than the usual FE standard errors. 


xtreg lwage expersq married union d81-d87, fe 


Fixed-effects (within) regression 
Group variable: nr 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


0.1806 
0.0286 
0.0888 


Number of obs 
Number of groups 


expersq | 
married | 
union | 
d81 | 

d82 | 

d83 | 
| 

| 

| 

| 

| 


sigma_u | 
sigma_e | 
rho | 


- .0051855 
. 0466804 
. 0800019 
1511912 
. 2529709 
. 3544437 
-4901148 
. 6174823 
. 7654966 
»9250249 
1.426019 


. 0007044 
0183104 
0193103 
.0219489 
.0244185 
0292419 
0362266 
. 0452435 
.0561277 
.0687731 
. 0183415 


Obs per group: min 
avg 
max 

F(10, 3805) 

Prob > F 

P>|t| 

0.000 - .0065666 

0.011 .0107811 

0.000 .0421423 

0.000 . 1081584 

0.000 . 2050963 

0.000 .2971125 

0.000 . 4190894 

0.000 .5287784 

0.000 - 6554532 

0.000 . 7901893 

0.000 1.390058 


- .0038044 
0825796 
1178614 

.194224 
. 3008454 
.4117749 
. 5611402 
. 7061861 
.8755399 
1.059861 
1.461979 


. 39176195 
. 35099001 
.55472817 


(fraction of variance 


F test that all u_i=0: 


F(544, 3805) = 


Prob > F = 0.0000 


xtreg lwage expersq married union d81-d87, fe cluster(nr) 


Fixed-effects (within) regression 
Group variable: nr 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


0.1806 
0.0286 
0.0888 


-0.1222 


(Std. Err. 


Robust 
Std. Err. 


Number of obs 
Number of groups 


Obs per group: 


F(10,544) 
Prob > F 


min 
avg 
max 


46.59 
0.0000 


adjusted for 545 clusters in nr 
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expersq | -.0051855 
married | . 0466804 
union | . 0800019 
d81 | .1511912 

d82 | . 2529709 

d83 | . 3544437 

d84 | -4901148 

d85 | - 6174823 

d86 | . 7654966 

d87 | - 9250249 
cons | 1.426019 
sigma_u | .39176195 
sigma_e | .35099001 
rho | .55472817 


. 0008102 -6.40 000 
0210038 2.22 027 
.0227431 3.52 000 
.0255648 5.91 000 
0286624 8.83 000 


. 0348608 10.17 
.0454581 10.78 
. 0568088 10.87 
.071244 10.74 
. 0840563 11.00 
0209824 67.96 


GGeoo0o000000 
© 
© 
© 


(fraction of variance 


-.0067771 
.0054218 
. 0353268 
. 1009733 
. 1966684 
. 2859655 
. 4008199 
. 5058908 
.6255495 
. 7599103 
1.384802 


- .0035939 
.0879389 
. 1246769 
. 2014091 
. 3092733 

»422922 
. 5794097 
. 7290737 
. 9054436 
1.09014 
1.467235 


d. The following Stata session adds the year dummy-education interaction terms. There is 


no evidence that the return to education has changed over time for the population represented 


by these men. The p-value for the joint robust test is about . 89. 


gen d81educ = d81*educ 
gen d82educ = d82*educ 
gen d83educ = d83*educ 
gen d84educ = d84*educ 
gen d85educ = d85*educ 
gen d86educ = d86*educ 
gen d87educ = d87*educ 


xtreg lwage expersq married union d81-d87 d81educ-d87educ, fe cluster(nr) 


Fixed-effects (within) regression 


Group variable: nr 


within 0.1814 
between 0.0211 
overall = 0.0784 


R-sq: 


Number of obs 


Number of groups 


Obs per group: 


min 
avg 
max 


4360 
545 


28.33 
0.0000 


adjusted for 545 clusters in nr 


corr(u_i, Xb) = -0.1732 
lwage | Coef 
expersq | -.0060437 
married | 0474337 


F(17,544) 
Prob > F 
(Std. Err 
Robust 
Std. Err. t P>|t | 
. 0010323 -5.85 0.000 
.0210293 2.26 0.024 
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- .0080715 
.006125 


- .0040159 
. 0887423 


union | .0789759 
d81 | -0984201 

d82 | . 2472016 

d83 | . 408813 

d84 | . 6399247 

d85 | . 1729397 

d86 | . 9699322 

d87 | 1.188777 
d8ieduc | - 0049906 
d82educ | .001651 
d83educ | -.0026621 
d84educ | -.0098257 
d85educ | -.0092145 
d86educ | -.0121382 
d87educ | -.0157892 
_cons | 1.436283 
sigma_u | .39876325 
sigma_e | .3511451 
rho | .56324361 


.022762 
. 1463954 
. 1490668 
.1716953 
.1873708 
. 2090195 
. 2463734 
. 2580167 
.0122858 

.012194 
.0136788 


3.47 0.001 . 0342638 . 123688 
0.67 0.502 - .1891495 . 3859897 
1.66 0.098 - .0456155 . 5400186 
2.38 0.018 .071546 . 74608 
3.42 0.001 .2718659 1.007984 
3.70 0.000 . 3623554 1.183524 
3.94 0.000 . 4859724 1.453892 
4.61 0.000 . 6819456 1.695608 
0.41 0.685 - .0191429 .0291241 
0.14 0.892 - .0223021 .025604 
-0.19 0.846 - .0295319 .0242076 
-0.67 0.504 - .0386757 0190243 
-0.61 0.542 - .0389085 0204796 
-0.72 0.472 - .0452558 0209794 
-0.97 0.335 - .0479172 0163389 
63.24 0.000 1.391668 1.480897 


testparm d81educ-d87educ 


( 1) d81educ = 0 
( 2) d82educ = 0 
( 3) d83educ = 0 
( 4) d84educ = 0 
( 5) d85educ = 0 
( 6) d86educ = © 
( 7) d87educ = 0 
F( 7, 544) = 
Prob > F = 


0.43 
0.8851 


e. First, I created the lead variable, and then included it in the FE estimation with fully 


robust inference. As you can see, unionp1 is statistically significant with p-value = .029, and 


its coefficient is not small. It seems union;; fails the strict exogeneity assumption, and we 


possibly should use an IV approach as described in Chapter 11. (However, coming up with 


instruments is not trivial.) 


gen unionpi = union[_n+1] if year < 1987 
(545 missing values generated) 


xtreg lwage expersq married union 


Fixed-effects (within) regression 


Group variable: nr 


0 
0 


R-sq: within 
between 


.1474 
.0305 


unionp1 d81-d86, fe cluster(nr) 


Number of obs = 3815 
Number of groups = 545 
Obs per group: min = 

avg = 7. 
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overall = 0.0744 


max 


F(10,544) 
Prob > F 


31.29 
0.0000 


(Std. Err. adjusted for 545 clusters in nr 


corr(u_i, Xb) = -0.1262 
| Robust 
lwage | Coef Std. Err 
expersq | -.0054448 .0009786 - 
married | . 0448778 .0235662 
union | .0763554 0236392 
unionp1 | .0497356 .0227497 
d81 | .1528275 .0257236 
d82 | - 2576486 .0304975 
d83 | . 3618296 . 0384587 
d84 | . 5023642 .0517471 
d85 | - 6342402 . 065288 
d86 | . 7841312 0826431 
cons | 1.417924 .0225168 6 
sigma_u | .39716048 
sigma_e | .35740734 
rho | 


P>|t| [95% Conf. Interval 

0.000 - .0073671 - .0035226 
0.057 - .0014141 .0911697 
0.001 . 0299202 . 1227906 
0.029 . 0050477 . 0944236 
0.000 . 1022977 . 2033573 
0.000 .1977413 . 3175558 
0.000 . 2862839 . 4373754 
0.000 . 4007155 . 6040129 
0.000 . 5059928 . 7624876 
0.000 .6217924 . 9464699 
0.000 1.373693 1.462154 


:5525375 (fraction of variance due to u_i) 


f. The Stata output shows that the union premium for Hispanic men is well below that of 


non-black men: about 13 percentage points lower. The difference is statistically significant, 


too. The estimated union “premium” is actually about —3.5% for Hispanics, although it is not 


Statistically different from zero. The estimated wage premium for black men is about 7.1 


percentage points higher than the base group, but the difference is not statistically significant. 


gen black_union = black*union 


gen hisp_union = hisp*union 


xtreg lwage expersq married union black_union hisp_union d81-d87, fe 


Fixed-effects (within) regression 
Group variable: nr 


R-sq: within = 0.1830 


between = 0.0267 
overall = 0.0871 
corr(u_i, Xb) = -0.1360 
lwage | Coef Std. Err 
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Number of obs = 4360 
Number of groups Š 545 
Obs per group: min = 
avg = 8. 
max = 
F(12, 3803) = 70.99 
Prob > F 0.0000 
P>|t| [95% Conf. Interval 


expersq 
married 
union 
black_union 
hisp_union 
d81 

d82 


sigma_u 
sigma_e 
rho 


- .005308 
.0461639 
.0957205 
.0714378 
. 1302478 
. 1507003 
. 2545937 
. 3576139 
»4947141 
. 6236823 
. 7750896 
. 9344805 


1.42681 


. 393505 


. 35056318 
55752062 


. 0007048 
0182922 
0244326 
. 0532042 
. 0485409 
.0219236 
. 0243936 


.029227 


.0362132 
. 0452345 
.0561524 
-0687783 
.0183207 


OO0OO0O0O000000000 


- .0066898 
.0103004 
.0478183 
.0328737 
. 2254166 
.1077172 


. 206768 


. 3003119 


.423715 


.5349961 


.664998 
7996347 


1.390891 


(fraction of variance due to u_i) 


- .0039262 


.0820275 
. 1436227 
.1757492 
.0350791 
. 1936833 
. 3024194 

.414916 
-5657132 
. 7123686 
. 8851813 
1.069326 
1.462729 


F test that all u_i=0: 


F(544, 3803) = 


Prob > F = 0.0000 


xtreg lwage expersq married union black_union hisp_union d81-d87, fe 
cluster (nr) 


Fixed-effects (within) regression 


Group variable: nr 


R-sq: 


corr(u_i, Xb) 


within 
between 
overall = 


= 0.1830 
0.0267 
0.0871 


-0.1360 


Number of obs 
Number of groups 


Obs per group: min = 


avg 


max = 


40.16 
0.0000 


adjusted for 545 clusters in nr 


expersq 
married 
union 
black_union 
hisp_union 
d81 

d82 


1.42681 


. 393505 


. 35056318 
.55752062 


Robust 


.0008095 
.0209641 
. 0304494 
. 0600866 
. 0493283 


.025519 


.0286062 
. 0348463 
. 0453929 


-056721 
.071142 


. 0838788 
.0209431 


(fraction of variance 


. 0068982 
. 0049834 
.0359077 
.0465925 
.2271451 
.1005725 
. 1984014 
. 2891641 
. 4055473 
.5122633 


. 635343 


. 7697145 
1.385671 


- .005308 
. 0461639 
.0957205 
.0714378 
. 1302478 
. 1507003 
. 2545937 
. 3576139 
.4947141 
. 6236823 
. 7750896 
. 9344805 


- .0037178 


. 0873444 
. 1555333 

. 189468 
. 0333505 
. 2008281 
. 3107859 
. 4260638 
. 5838809 
. 7351014 
. 9148363 
1.099247 
1.467949 


lincom union + hisp_union 


( 1) union + hisp_union = 0 


g. We reject the null hypothesis that {union : t = 1,..., Ty is strictly exogenous even 


when the union premium is allowed to differ by race and ethnicity. 


xtreg lwage expersq married union black_union hisp_union d81-d86 unionp1i, fe 


cluster(nr) 
Fixed-effects (within) regression Number of obs = 3815 
Group variable: nr Number of groups = 545 
R-sq: within = 0.1497 Obs per group: min = 
between = 0.0293 avg = T: 
overall = 0.0735 max = 
F(12,544) = 27.21 
corr(u_i, Xb) = -0.1386 Prob > F = 0.0000 
(Std. Err. adjusted for 545 clusters in nr 
| Robust 
lwage | Coef. Std. Err. t P>|t | [95% Conf. Interval 
ss akc a a +--------------------------------------------------------------- 
expersq | -.0055477 . 0009833 -5.64 0.000 - .0074793 - .0036162 
married | .044659 .0235905 1.89 0.059 - .0016807 . 0909986 
union | . 0886305 .031931 2.78 0.006 .0259075 . 1513536 
black_union | . 0849246 .0627531 1.35 0.177 - . 0383434 . 2081926 
hisp_union | -.1179177 .0525974 -2.24 0.025 -.2212365 - .0145988 
d81 | . 1522945 .0256633 5.93 0.000 . 1018831 . 2027058 
d82 | . 2589966 . 0304368 8.51 0.000 . 1992085 . 3187846 
d83 | . 3643699 . 0385023 9.46 0.000 . 2887385 . 4400013 
d84 | . 506142 .0518005 9.77 0.000 . 4043885 .6078955 
d85 | . 6393639 .0654104 9.77 0.000 .5108761 . 7678517 
d86 | . 7921705 .0828197 9.57 0.000 . 6294849 . 954856 
unionp1 | .0502016 .0227235 2.21 0.028 .0055649 . 0948382 
_cons | 1.418021 .022508 63.00 0.000 1.373808 1.462234 
ea a E E +--------------------------------------------------------------- 
sigma_u | .39854263 
sigma_e | .35703614 
rho | .5547681 (fraction of variance due to u_i) 


10.13. a. Showing that this procedure is consistent with fixed T as N — œ requires some 
algebra. First, in the sum of squared residuals, we can “concentrate out” the a; by finding 4;(b) 


as a function of (x;,y,) and b, substituting back into the sum of squared residuals, and then 
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minimizing with respect to b only. Straightforward algebra gives the first order conditions for 


each i as 


T 
DiGi —ai- xicb)/hit =0 


t=1 


which implies 


T T 
ai(b) = wi 2 vuta) = (Lesh Jp = jy — Xb, 
t=1 t=1 
T -1 = T heldi an 
where w; = (oh it) ) > 0 and py = w; ome Yulh it) and a similar definition holds for 
x)’. Note that y} and Xx” are weighted averages with weights w;/hi, t = 1,2,...,7. If his equals 
the same constant for all ¢, y7” and x; are simply weighted averages. If A; equals the same 


constant for all ¢, y% and x)” are the usual time averages. 


Now we can plug each d;(b) into the SSR to get the problem solved by B ppyzs: 
N 
min >. S0- Yr) — (Xu — X; )b] Phi. 


This is just a pooled weighted least squares regression of (vir — Y¥) on (Xx — X}’) with weights 
1/hu. Equivalently, define Pa = (vi -—3}”)/ fhit, Šu = (Xu —¥?)V/ Shir, all 


t=1,...,7,i =1,...,N. Then 6 can be expressed in usual pooled OLS form: 


n N T aE ONS T 
B eewzs = (SLs) (LEs) (10.90) 


Note carefully how the initial y; are weighted by 1/h,, to obtain y7”, but where the usual 
1/,/hi, weighting shows up in the sum of squared residuals on the time-demeaned data (where 


the demeaning is a weighted average). Given (10.90), we can easily study the asymptotic 
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(N — oo) properties of B. First, we can write p?” = x;"B +c; + aj’where üf = w; 2; Uitdlhi ). 
Subtracting this equation from y; = Xu + c; + ui for all t gives Yi = Xup +x, where 

Üu = (un — üY )/ hu . When we plug this in for Yx in (10.90) and divide by N in the appropriate 
places we get 


N T N T 
Beas =ß+ G 2 yx | (x i > >S adi Y) (10.91) 


i=1 t=1 


From this last equation we can immediately read off the consistency of ĝ rewts Tegardless of 
whether Var(w;:|x;,h;,c;) = 024i. Why? We assumed that E(w;|x;,h;,c;) = 0, which means u; 
is uncorrelated with any function of (x;,h,), including X;. Therefore, E(X;,u;) = 0, t = 1,...,T 
under E(wi:|x;, hi,c;) = 0 with any restrictions on the conditional second moments of 
{uj : t = 1,..., T}. As long as we assume that Eo E(Š;Š) has rank K, we can apply the 
consistency result for pooled OLS to conclude plim(p rewis) = B. (We can even show that 
E(B rewrsl X. H) = P, that is, the FEWLS estimator is conditionally unbiased.) 
b. It is clear from (10.91) that B rewLs İS J/N -asymptotically normal under mild assumptions 

because we can write 

N N T 

JN Giris =p) = G > D zits (re D >a ida ) 

; i=1 t=1 

The asymptotic variance is generally 


Avar JN (Ê pews- B) = ABAT, 


where 
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T 
A = $ Elki) 


t=1 


T T f 
B= l > Rul) @ Raul) | 
t=1 t=1 


lif we assume that Cov(wir, Uis|Xi, hi, Ci) = E(uittis|xi,hi,c;) = 0, t + s then by a standard 


iterated expectations argument, 
E| (imul fhu ) (sus fh) | = E| (uineis&, isl [hihi ) ] = 0, t S. 


Further, given the variance assumption Var(w;|x;,h;,c;) = E(u2|x;,hj,c;) = o2hj, iterated 


expectations implies 
E| (Kita! fh) (ited Sha ) | = BORK Kalba) = OZER Kill). 
It follows then that 
B=o02A 
and so 
Avar JN (Brewis -B) = 0247. 
c. The same subtleties that arise in estimating o2 for the usual fixed effects estimator crop 


up here as well. Assume the zero conditional covariance assumption and correct variance 


specification in part b. Then the residuals from the pooled OLS regression 

Yu On SH t= 1,...,T, i= 1,...,N, (10.92) 
say ii, are estimating ti; = (ui — ĀY)/ Jhu in the sense that we obtain fin from üz by replacing 
B with B ys). Now 


E(ü3) = El(i/ha)] — ZE ui Yh] = E[G!)?/hy)) = 03- 207E[(wilha)] + o7E[(wil/hi)], 
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where the law of iterated expectations is applied several times, and E[(a)’)?|x;,h;] = o2w; has 


been used. Therefore, E(ii%) = o2[1—E(wi/hi)], t = 1,...,7, and so 


T T 
DEL) = 03T- Elw: e $ Uhol} = o2(T-1). 


This contains the usual result for the within transformation as a special case. A consistent 
estimator of o? is SSR/[N(T— 1) — K], where SSR is the usual sum of squared residuals from 
(10.92), and the subtraction of K is optional as a degrees-of-freedom adjustment. The estimator 


of Avar(B rewzs) iS then 


d. If we want to allow serial correlation in the {uj}, or allow Var(ux|x; h; ci) + o2/;, then 
we can just apply the robust formula for the pooled OLS regression (10.92). See equation 
(7.77) in the text. 

10.14. a. Because E(/;|z;) = 0, zy and h; are uncorrelated, so 
Var(c;) = Var(z:y) + 0% = y'Var(z;)y + 07 > 07. Assuming that Var(z;) is positive definite — 
which we must to satisfy the RE rank condition — strict inequality holds whenever y + 0. Of 
course y'Var(z;)y > 0 is possible even if Var(z;) is not positive definite. 

b. If we estimate the model by fixed effects, the associated estimate of the variance of the 
unobserved effect is o2. If we estimate the model by random effects (with, of course, Zz; 
included), the variance component is o%. This makes intuitive sense: with random effects we 
are able to explicitly control for time-constant variances, and so the z; are effectively taken out 


of c; with h; as the remainder. 
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c. Using equation (10.81), we obtain A, and A, as follows. 


An = 1- {1/1 + Kozo Dh 
Ae = 1- {1/[1 + Kood 


SO 
he —An = {U[1 + T(o2/o2)]}} 1? — {1/1 + ToD. 


Therefore, A. — An > 0 iff 1/[1 + T(o7/o2)] > 1/[1 + T(o2/02)] (because 1 + T(o%/02) and 
1 + T(o2/o2) are positive). We conclude that 

Ne — An = 0 iff T(a2/02) > T(o?}/0? 
which holds because we already showed that o2 > of (often with strict inequality). 

d. If we use FE to estimate the heterogeneity variance then we are estimating o2, which 
means A. is effectively the quasi-time demeaning parameter used in subsequent RE estimation. 
If we use POLS, then we estimate o7, which then delivers the appropriate quasi-time 
demeaning parameter A,. Thus, we should use POLS, not FE, as the initial estimator for RE 
estimation when the model includes time-constant variables. 

e. Because Problem 7.15 contains a more general result, a separate proof is not provided 
here. One need not have an RE structure and, as mentioned in the test, we do not need 
homoskedasticity of Var(c;|z;), either. 


10.15. a. Because v; is independent of x;, V; is also independent of x;. Therefore, 
E(vilxi, Vi) = E(vid Vi), B= dea LE: 


Because we assume linearity, we know E(v;;|v;) is the linear projection (with intercept zero 


because E(v;) = 0 for all £): 
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Cov(v;, Vit) 


E(vid¥i) = Var(V;) 


i» 


Now 


T T 
Cov(¥i, Vi) = ron( 3 vn ) =T! | varow + > COV (Vir, Vir) j 
r=1 r+t 
= T+H[(02 +02) + (T-1)02] = o2 + 02/T 
Also, because ¥; = c; + ü;, and {uj : t = 1,..., Ty is serially uncorrelated with constant 
variance, and each u; is uncorrelated with c;, 
Var(¥;) = Var(c;) + Var(ū;) = o2 + o2/T = Cov(Vi, vit) 
We have shown that the slope in the population regression of v; on V; is unity, and so 
E(vilv:) = Vj. 


b. Because y; = XB + Vi, E(vilxi, Vi) = E(vir|xi, Yi), and so 


EQvilki,¥i) = EQaB + vielki,¥,) = Xup + E@lxi, Yi) = Xup + Vi 
Xub + Gi = x iB). 


c. We can rewrite equation in part b as follows. 


Vit = Xup +i- XB + ri 
E(rulX:, Yi) = 0 


To impose the coefficient of unity on y; and the common vector on x; and X;, write 
Yi —Yi = (Xe —¥)B + 7re, t = 1,...,7, 
which we can estimate consistently by POLS because E(r;;|x;) = 0 for all t. 
d. The RE estimator is not based on E(vi|xi, ¥;) = EQvielxi,i, Xi), but on 


Eill) = E(virlxir). We now see that the FE estimator can be given an interpretion as an 


195 


estimator that “controls for” y;, along with X; (even though it does not need to under the RE 
assumptions). 


e. Under Assumptions RE1 to RE3 we can derive 
L(vilk:,¥;) = Lva), t = 1,...,7 
without further assumptions. First, we know that the LP has the form 
L(vilXi, Vi) = XY, + Ppi = Wid: 
where 
8, = [E(wiwi)] TE(wivir) 


But E(w;w,) is block diagonal because E(x;¥;) = 0. Further, E(xivx) = 0, and so 


0 
5; = 
Pt 
It is easy to see that 
Evi Cov(vi, Vi 
AE (Vivu) _ Cov; vir) zi 


E(¥7) Var(¥i) 
10.16. a. By independence between (v7, Vj2,..., Vir, Vi r+1) and (Xi, Xi2,..., XiT, Xi,7+1), it 
follows immediately that 
E(vi,741|Ki, XT, Vis- Vir) = E(virsilvia,.-., vir). 
Because we are assuming that all conditional expectations involving <v} are linear, we know 
E(vi.r1|vi,.--, vir) is a linear function of (va, ..., vir). The tricky part is to show that 
E(Qvirui|va,.-., Vir) = E(virui|¥;). Intuitively it makes sense that the elements in (vj1,..., vir) 
should get equal weight under the RE variance-covariance structure. One way to verify that 


each element gets the same weight, and to determine that common weight, is to note that the 
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vector in the LP, say pņ, satisfies 


o2+02 o2 ves o? (on 
o2 o2t+o, ` ; o? 
o? Pr : 

o? e ot of+0%2 Oe 


If we hypothesis that p, = n7j7, where jp is the T x 1 vector of ones, then 
(To? + o1)nr = o? 


SO 


Oc 


Meo (To? +02) 


This must be the unique solution because the RE variance matrix is assumed to be nonsingular 


(o2 > 0). So we have shown 


o 
E(viralXi X71, Vits.. Vir) = | a) Jou F Viz +... +Vir) 


which is what the problem asked to show. 
b. This is straightforward given part a: 


EQitalXi XiT, Va,---, Vir) = Xin B + E(viri|Xi, XiT, Vi,---, Vir) 


. o? 
= Xí T+ +E iT+1|Vi) = Xi + | — 
Xira Miral) = xirap [i (o2 + 03/T) fn 


c. If we condition only on the history of covariates, and not the past composite errors, we 


get 
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EQiralXi Xira) = Xir1B + EWviru|Xi, Xira) = Xiru B+ 0 = xiruB 


because we are assuming RE.1 for all T+ 1 time periods. 


d. Forecast errors are given by: 


2 
o r 
Yita — EWiralXi XiT, Vis... ViT) = (Xira p + vir) — fxirap -| Cito 7 oun ia 
C u 


e coż 5 
= VijT+1 — a, a: a i 
(o2 + 03/T) 


Let 0 = o2/(02 + o2/T). Then the variance of the forecast error is 


Var(vi, T = Ovi) = Var(vi r+) + 6?Var(¥;) = 20Cov(v;T+1, Vi) 


= 02+02—20(02) + 0°(02 + 02/T) 

ss ePaper Ga r a 995. E 
(oé+o%/T) (o+ oT) 

= 0? + 02 — 020. 


If we use E(y;r+1|Xi, X;,741) to forecast y; r+ı the forecast error is simply 
vini = Yir — E(virei|Xi, Xi,741), which has variance o2 + oł. Because o2 > 0 and 0 > 0, 
Var(vi.ru1 — OV;) < Var(vi r1) 

with strict inequality when o2 > 0. Of course, all we have really shown is that 
Var[E(v; rmv] < Var(vir+1), which we already know from general properties of conditional 
expectations. Using more information in a conditional mean results in a smaller prediction 
error variance. 

e. We can use N cross section observations and the first T time periods to estimate the 


parameters by random effects. Let 6 gr De the RE estimator, and let 
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Then 
~ A a^ 
Yiri = Xipre + OV; 


where 

a is 

vi = + Ds Dir and Vit = Vit — XitBpp- 

t=1 
10.17. a. By the usual averaging across time, the quasi-time-demeaned equation can be 
written, for each time period, as 
Vit — Ay; = (1 = Aja E (d, = Ad)n H (w; = AW) + (vi = Avi) 
= [(1- 4a + (1-A) dn] + (d, - d)n + (w; — AWS + (Vir — AV) 


it 


which is what we wanted to show by letting u = (1 — å)a + (1 — A) dn. 

b. The first part — the yN -asymptotic representation of the RE estimator — is just the usual 
linear representation of a pooled OLS estimator laid out in Chapter 7. It also follows from the 
discussion of random effects in Section 10.7.2. For the second part, write 


Vit — AVi = (Ci + Ui) — (Aci + Adi) = (1 -A)c; + ui — Aü, and plug in: 


T T T T 
Sa- T) -45 = X- DA = Ae + X (d; = Aun - X (d; - a)i; 
t=1 t=1 y p t=1 ' 
= (1-4)c; $ (d, - d) +$ (di - Tus - (A) X (d, - d) 
t=1 t=1 t1 


because }, (d, - d) = 0. 
c. Actually, there is nothing to do here. This is just the usual first-order representation of 


the fixed effects estimator, which follows from the general pooled OLS results. 
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d. From part b we can write 
N T N T 
AWN Bae — p-a L ev- as) + op(1) = (x paps +) + 0p(1), 
i=l 1 i=l tl 
where r; = [(d; — d Jui, (W; — AWi) (vin — A¥;)]'. From part c we can write 
N T N T 
AN (Bap - peaa >> Hon +o,(1) = (ns Seo s) + 0,(1), 
i=l l i=1 t1 
where s; = [(d; — dui, (W; — W:)ui]'. But the first R elements of ri and sy are (d; — d)'ui, 


which implies that 


AWN Bre- B) — A2VN pe - B) = NO? 


N 
=1 


T 0 
y ; , +0p(1), 
ei (Wi — AW;) ei — (Wit — Wi) Ui 


where “0” is R x 1 and ey = vir — Avu. The second part of the vector is M x 1 and generally 
satisfies the central limit theorem. Under standard rank assumptions 
Var[ (Wir — AW;) en — (Wir — Wi) uit] has rank M 

e. If there were no wx, part d would imply that the limiting distribution of the difference 
between RE and FE is degenerate. In other words, we cannot compute the Hausman test 
comparing the FE and RE estimators if the only time-varying covariates are aggregates. (In 
fact, the FE and RE estimates are numerically identical in this case.) More generally, the 
variance-covariance matrix of the difference has rank M, not M + R (whether or not we assume 
RE.3 under Ho). A properly computed Hausman test will have only M degrees-of-freedom, not 
M + R. The regression-based test from equation (10.88) forces one to get the 
degrees-of-freedom corret, as there is obviously no value in adding d, a vector of constants, to 


the regression. 
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10.18. a. The Stata results are given below. All estimates are numerically identical. 


reg lwage d81-d87 


Source | 


Model | 


xtreg lwage d81-d87, 


Random-effects 
Group variable 
within 
between 


R-sq: 


overall = 


Random effects 
corr(u_i, X) 


. 1802697 
. 2390696 

. 286666 
. 3576976 
. 4068128 
.4671213 
. 5338818 
1.436525 


.1615039 
. 2203038 
.2679001 
. 3389318 

. 388047 
. 4483555 

.515116 
1.436513 


SS df MS Number of obs 
SiS Seis SS Sep Ue See reais Gee ea coe E F( 7, 4352) 
92.9668229 7 13.2809747 Prob > F 
1143.56282 4352 .262767192 R-squared 
-------------+------------------------------ Adj R-squared 
1236.52964 4359 „283672779 Root MSE 
Coef Std. Err t P>|t | 
. 1193902 . 0310529 3.84 0.000 .0585107 
.1781901 .0310529 5.74 0.000 . 1173106 
- 2257865 .0310529 7.27 0.000 . 1649069 
. 2968181 .0310529 9.56 0.000 . 2359386 
. 3459333 .0310529 11.14 0.000 . 2850538 
- 4062418 . 0310529 13.08 0.000 . 3453623 
- 4730023 .0310529 15.23 0.000 4121228 
1.393477 .0219577 63.46 0.000 1.350429 
re 
GLS regression Number of obs = 
© nr Number of groups = 
= 0.0000 Obs per group: min = 
= 0.0000 avg = 
= 0.0752 max = 
u_i ~Gaussian Wald chi2(7) = 
= 0 (assumed) Prob > chi2 = 
Coef Std. Err Z P>|Z | 
. 1193902 .021487 5.56 0.000 .0772765 
.1781901 .021487 8.29 0.000 . 1360764 
- 2257865 .021487 10.51 0.000 . 1836728 
. 2968181 .021487 13.81 0.000 . 2547044 
. 3459333 .021487 16.10 0.000 . 3038196 
- 4062418 .021487 18.91 0.000 . 3641281 
- 4730023 .021487 22.01 0.000 . 4308886 
1.393477 .0219577 63.46 0.000 1.350441 
. 37007665 
. 35469771 
52120938 (fraction of variance due to u_i) 
d81-d87, fe 


xtreg lwage 


Fixed-effects (within) regression 


Group variable 


z: Ar 


Number of obs = 
Number of groups Š 
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4360 
545 


R-sq: within = 
between = 
overall = 


corr(u_i, Xb) 


Obs per group: min = 


. 1193902 
. 1781901 
. 2257865 
. 2968181 
. 3459333 
. 4062418 
.4730023 
1.393477 


. 39074676 
. 35469771 
. 54824631 


.021487 
.021487 
.021487 
.021487 
.021487 
.021487 
.021487 
.0151936 


avg = 
max = 
F(7,3808) = 
Prob > F 
P>|t | 
0.000 .0772631 
0.000 . 136063 
0.000 . 1836594 
0.000 - 254691 
0.000 . 3038063 
0.000 . 3641147 
0.000 . 4308753 
0.000 1.363689 
to u_i) 


. 1615173 
. 2203172 
.2679135 
. 3389452 
. 3880604 
. 4483688 
.5151294 
1.423265 


Prob > F = 0.0000 


F test that all 
reg d.(lwage d 


Source | 


u_i=0: 


F(544, 3808) = 


81-d87), nocons 


7 2.76616631 
3808 .196756785 


Model | 


19.3631642 
749 . 249837 


Number of obs 
F( 7, 3808) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


. 1193902 


.1781901 


.2257865 


. 2968181 


. 3459333 


. 4062418 


.4730023 


.0190006 


.0268709 


.03291 


.0380011 


.0424866 


.0465417 


.0502708 


.63 


. 86 


81 


.14 


0.000 


0.000 


0.000 


0.000 


0821379 


1255074 


. 1612636 


. 2223136 


. 2626347 


. 3149927 


. 3744421 


. 1566425 


. 2308728 


. 2903093 


. 3713226 


. 4292319 


. 4974908 


.5715626 


b. The Stata output follows. The POLS and RE estimates are identical on the year dummies 
and the three time-constant variables. This is a general result: if the model includes only 
aggregate time effects and individual-specific covariates that have no time variation, POLS = 
RE (and, in particular, there is no efficiency gain in using RE). 

When FE is used, of course the time-constant variables drop out. The estimates on the year 
dummies are the same as POLS and RE. (Recall that the “constant” reported by FE is the 
average of the estimated heterogeneity terms. When POLS and RE include time-constant 


variables the FE “constant” does not equal the intercept from POLS/RE.) 


reg lwage d81-d87 educ black hisp 


Source | SS df MS Number of obs = 4360 
a Ramee eet See sees ea ae See F( 10, 4349) = 73.66 
Model | 179.091659 10 17.9091659 Prob > F = 0.0000 
Residual | 1057.43798 4349 .243145087 R-squared = 0.1448 
-------------+------------------------------ Adj R-squared = 0.1429 
Total | 1236.52964 4359 .283672779 Root MSE = 4931 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 
fae) i ak lc a a a +--------------------------------------------------------------- 
d81 | . 1193902 .029871 4.00 0.000 . 0608279 .17 79526 
d82 | .1781901 .029871 5.97 0.000 .1196277 . 2367524 
d83 | .2257865 .029871 7.56 0.000 .1672241 . 2843488 
d84 | . 2968181 .029871 9.94 0.000 . 2382557 . 3553804 
d85 | . 3459333 .029871 11.58 0.000 .287371 . 4044957 
d86 | . 4062418 .029871 13.60 0.000 . 3476794 . 4648041 
d87 | 4730023 .029871 15.83 0.000 . 41444 . 5315647 
educ | 0770943 0043766 17.62 0.000 .0685139 .0856747 
black | -.1225637 0237021 -5.17 0.000 - .1690319 - .0760955 
hisp | 024623 0213056 1.16 0.248 - .0171468 . 0663928 
cons | 4966384 0566686 8.76 0.000 . 3855391 .6077377 


xtreg lwage d81-d87 educ black hisp, re 


Random-effects GLS regression Number of obs = 4360 
Group variable: nr Number of groups 5 545 
R-sq: within = 0.1625 Obs per group: min = 
between = 0.1296 avg = 8. 
overall = 0.1448 max = 
Random effects u_i ~Gaussian Wald chi2(10) = 819.51 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 
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ee fan ae ieee ee pele el Sy a +--------------------------------------------------------------- 
d81 | . 1193902 .021487 5.56 0.000 .0772765 . 1615039 
d82 | .1781901 .021487 8.29 0.000 . 1360764 . 2203038 
d83 | - 2257865 .021487 10.51 0.000 . 1836728 - 2679001 
d84 | . 2968181 .021487 13.81 0.000 - 2547044 . 3389318 
d85 | . 3459333 .021487 16.10 0.000 . 3038196 . 388047 
d86 | . 4062418 .021487 18.91 0.000 . 3641281 . 4483555 
d87 | .4730023 .021487 22.01 0.000 . 4308886 .515116 
educ | .0770943 .009177 8.40 0.000 . 0591076 . 0950809 
black | -.1225637 . 0496994 -2.47 0.014 - .2199728 - .0251546 
hisp | .024623 . 0446744 0.55 0.582 - .0629371 1121831 
cons | . 4966384 .1122718 4.42 0.000 . 2765897 . 7166871 
al i a a ad +--------------------------------------------------------------- 
sigma_u | .34337144 
sigma_e | .35469771 
rho | .48377912 (fraction of variance due to u_i) 


xtreg lwage d81-d87 educ black hisp, fe 
note: educ omitted because of collinearity 
note: black omitted because of collinearity 
note: hisp omitted because of collinearity 


Fixed-effects (within) regression Number of obs = 4360 
Group variable: nr Number of groups = 545 
R-sq: within = 0.1625 Obs per group: min = 
between = : avg = 8. 
overall = 0.0752 max = 
F(7,3808) = 105.56 
corr(u_i, Xb) = 0.0000 Prob > F = 0.0000 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 
si ie Sa Sk: a i= a “al +--------------------------------------------------------------- 
d81 | . 1193902 .021487 5.56 0.000 .0772631 . 1615173 
d82 | .1781901 .021487 8.29 0.000 . 136063 . 2203172 
d83 | - 2257865 .021487 10.51 0.000 . 1836594 . 2679135 
d84 | . 2968181 .021487 13.81 0.000 - 254691 . 3389452 
d85 | . 3459333 .021487 16.10 0.000 . 3038063 . 3880604 
d86 | . 4062418 .021487 18.91 0.000 . 3641147 . 4483688 
d87 | . 4730023 .021487 22.01 0.000 . 4308753 .5151294 
educ | (omitted) 
black | (omitted) 
hisp | (omitted) 
cons | 1.393477 .0151936 91.71 0.000 1.363689 1.423265 
ems “i, la is A a ea Gl ew ot a +--------------------------------------------------------------- 
sigma_u | .39074676 
sigma_e | .35469771 
rho | .54824631 (fraction of variance due to u_i) 
F test that all u_i=0: F(544, 3808) = 8.45 Prob > F = 0.0000 
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c. The reported standard errors for POLS and RE are not the same. The POLS standard 
errors assume, in addition to homoskedasticity, no serial correlation in the composite error — in 
other words, that there is no unobserved heterogeneity. At least the RE standard errors allow 
for the standard RE structure, which means constant variance and correlations that are the 
same across all pairs (¢,s). This may be too restrictive, but it is less restrictive than the usual 


OLS standard errors. 


d. The fully robust POLS standard errors — that allow any kind of serial correlation and 
heteroskedasticity — are reported below. We prefer these to the usual RE standard errors 
because, as noted in part c, the usual RE standard errors impose a special kind of serial 
correlation. Notice that the fully robust POLS standard errors are not uniformly larger than the 


usual RE standard errors. 


reg lwage d81-d87 educ black hisp, cluster(nr) 


Linear regression Number of obs = 4360 
F( 10, 544) = 49.41 
Prob > F = 0.0000 
R-squared = 0.1448 
Root MSE = .4931 


(Std. Err. adjusted for 545 clusters in nr 


| Robust 

lwage | Coef. Std. Err. t P>|t | [95% Conf. Interval 

sense i Sl a a hh at ie +--------------------------------------------------------------- 
d81 | . 1193902 .0244086 4.89 0.000 .0714435 .1673369 
d82 | .1781901 .0241987 7.36 0.000 . 1306558 .2257243 
d83 | .2257865 .0243796 9.26 0.000 .17 78968 .2736761 
d84 | . 2968181 .0271485 10.93 0.000 . 2434894 . 3501468 
d85 | . 3459333 .0263181 13.14 0.000 . 2942358 . 3976309 
d86 | . 4062418 .0273064 14.88 0.000 . 3526029 - 4598807 
d87 | 4730023 .025996 18.20 0.000 4219374 5240672 
educ | 0770943 0090198 8.55 0.000 0593763 0948122 
black | -.1225637 0532662 -2.30 0.022 - .2271964 - .017931 
hisp | 024623 0411235 0.60 0.550 - .0561573 1054033 
cons | 4966384 1097474 4.53 0.000 2810579 7122189 


e. The fully robust standard errors for RE are given below. They are numerically identical 
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to the fully robust POLS standard errors. Because we really have only one estimator — remeber, 


POLS = RE in this setup — there is one asymptotic variance. While there could be different 


ways to estimate that asymptotic variance, in this case the estimators are the same, and that is 


appealing because it means inference does not rely on the particular pre-programmed 


command. 


xtreg lwage d81-d87 educ black hisp, 


Random-effects 


Group variable: 


R-sq: within 
between 


overall = 


Random effects 
corr(u_i, X) 


GLS regressio 
nr 


0.1625 
= 0.1296 
0.1448 


u_i ~Gaussian 
= 0 (assu 


. 1193902 
. 1781901 
. 2257865 
. 2968181 
. 3459333 
. 4062418 


re cluster (nr) 


n Number of obs = 4360 
Number of groups = 545 
Obs per group: min = 
avg = 8. 
max = 

wald chi2(10) = 494.13 
med) Prob > chi2 = 0.0000 
(Std. Err. adjusted for 545 clusters in nr 

Robust 

Std. Err. Z P>|z | [95% Conf. Interval 
.0244086 4.89 0.000 .0715502 . 1672302 
.0241987 7.36 0.000 . 1307616 . 2256186 
.0243796 9.26 0.000 . 1780033 . 2735696 
.0271485 10.93 0.000 - 2436081 . 3500281 
.0263181 13.14 0.000 . 2943508 . 3975159 
.0273064 14.88 0.000 » 3527222 -4597613 
025996 18.20 0.000 »422051 5239536 
0090198 8.55 0.000 0594157 0947728 
0532662 -2.30 0.021 - .2269636 0181638 
0411235 0.60 0.549 - .0559775 1052236 
1097474 4.53 0.000 2815375 7117392 


sigma_u | 
sigma_e | 
rho | 


. 34337144 
. 35469771 
. 48377912 


(fraction of variance 
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Solutions to Chapter 11 Problems 

11.1. a. It is important to remember that, any time we put a variable in a regression model 
(whether we are using cross section or panel data), we are controlling for the effects of that 
variable on the dependent variable. The whole point of regression analysis is that it allows the 
explanatory variables to be correlated while estimating ceteris paribus effect of each 
explanatory variable. Thus, the inclusion of y;,-1 in the equation allows prog; to be correlated 
with y;;-1, and also recognizes that, due to inertia, yx is often strongly related to yii-1. 

An assumption that implies pooled OLS is consistent is 

E(ui|Zi, Xin Vir-1,Progir) = 0, all t, 

which is implied by but is weaker than dynamic completeness. Without additional 
assumptions, the pooled OLS standard errors and test statistics need to be adjusted for 
heteroskedasticity and serial correlation (although the latter will not be present under dynamic 
completeness). 

When y;r-1 is added to a regression model in an astructural way, we can think of the goal as 


being to estimate 


EQilZi, Xit,Vis-1,PVOR it); 
which means that we are controlling for differences in the lagged response when gauging the 
effect of the program. Of course, we might not have the conditional mean correctly specified; 
we may be simply estimating a linear projection. 
b. As we discussed in Section 7.8.2, this statement is incorrect. Provided our interest is in 
E(v it\Zi, Xit Vit-1, POR it), we are not especially concerned about serial correlation in the implied 


errors, 
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Uit = Vit — EQ ilZi, Xit, Vit-1,PVORit). 

Nor does serial correlation cause inconsistency in the OLS estimators. 

c. Such a model is the standard unobserved effects model: 

Yit = Qi + Xn B+ Oiprogy + Ci + uy, t = 1,2,...,T, 

where the 0; are the time effects (that can be treated as parameters). We would probably 
assume that (xj, progir) is strictly exogenous; the weakest form of strict exogeneity is that 
(xi, progi) is uncorrelated with u;s for all ¢ and s. Then we could estimate the equation by 
fixed effects or first differencing. If the u; are serially uncorrelated, FE is preferred. We could 
also do a GLS analysis after the fixed effects or first-differencing transformations, but we 
should have a large N. 

d. A model that incorporates features from parts a and c is 

Ve = Qt + Xup + Oiprogie + Pryier + Ci + Ui t= 1,...,T7. 

Now, program participation can depend on unobserved city heterogeneity as well as on lagged 
Yü (we assume that y is observed). Fixed effects and first-differencing are both inconsistent 
and N — œ with fixed T. 

Assuming that E(w i|X;, progi, Vit-1,Vit-25---»Vi0) = 0, a consistent procedure is obtained by 
first differencing, to get 

Vie = AXP + O1Aprogir + prAyits+a t+ Aun, t = 2,...,T. 

At time ¢ and Axx, Aprogi can be used as their own instruments, along with y; forj > 2. 
Either pooled 2SLS or a GMM procedure can be used. Past and future values of x; can also be 
used as instruments because {x;y is strictly exogenous. 


11.2. a. OLS estimation on the first-differenced equation is inconsistent (for all parameters) 
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if Cov(Aw;, Au;) + 0. Because wi is correlated with uj, for all t we cannot assume that Aw; 
and Au; are uncorrelated. 
b. Because u; is uncorrelated with Za, Zp, fort = 1,2, Au; = uin — ui is uncorrelated with 


Zi, and Zz, and so (Zi, Z2) are exogenous in the equation 
Ay; = Aziy + dAw; + Au; 
The linear projection of Aw; on (zj1,Z,.) can be written as 
AW; = Z1%1 + Z2%2 + Fj, E(z,ri) = Og =i, 
The question is whether the rank condition holds. Rewrite this linear projection in terms of Az; 
and, say, Z; as 
Aw; = Za (Tti — T2) + (Z2 — Za )T2 + ri = Zad + AZT? + ri, 
where Ay = n1 — T2. If Ai = 0, that is mı = m2, then the reduced from of Aw; depends only on 
Az;. Because Az; appears in the equation for Ay;, there are no instruments for Aw;. Thus, the 
change in w; must depend on the level of zi, and not just on the change in Zx. 
c. With T > 2 time periods we can write the differenced equation as 
Ayit = AZ + YAwi + Aun, t = 2,...,T. 
Now, under the assumption that w;, is uncorrelated with u; for s < t, we have natural 
instruments for Aw;. At time t, Au; depends on u; and u1,-1. Thus, valid instruments at time ¢ 
in the FD equation are wj;-2,...,wia. We need T > 3 for an IV procedure to work. With T = 3 
we have the cross sectional equation 
Ayiz = Azi3d + yAwig + Aug 


and we can instrument for Aw; with wi (and possibly z; from earlier time periods). 
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With T > 4, we can implement an IV estimator by using the simple pooled IV estimator 
described Section 11.4. Or, we can use the more efficient GMM procedure. Write the T— 1 
time periods as 

Ay, = AZiB + yAw; + Au;, 
where each data vector or matrix has T — 1 rows. The matrix that includes all possible 


instruments of observation i (with T — 1 rows) is 


z 0 0 0 0 0 O 0 0 0 
0 z wa 0 => 0 0 0 0 0 
0 0 0 0 


0 0 0 zi wp wa 
0 0 0 0 0 0 0 0 Zi WiT-2°*' Wil 
Putting in the levels of all z; for instruments in each time period is perhaps using too many 
overidentifying restrictions. The dimension could be reduced substantially by using only 


(Zit, Zi~-1) at period ¢ rather than z;. Further, periods for t > 3 one would use only wi,-2 and 
w;-3 as the IVs. 

d. Generally, the IV estimator applied to the time-demeaned equation is inconsistent. This 
is because w; is generally correlated with iit, as the latter depends on the idiosyncratic errors 
in all time periods. 

11.3. Writing yz = PXit + ci + ui — Bri, the fixed effects estimator B FE Can be written as 

N T 2 N T 
Bt G YQ - 2 G SO D n- ¥) (ua — i: - pfa- a») 
i=l 1 i=1 t1 
Now xj — Xi = (xj, —X7) + (ri — Fi). Then, because E(7j:|x7,c;) = 0 for all £, (xj, — x7) and 


(ru — Fi) are uncorrelated, and so 
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Var(xi — Xi) = Var(x}, — x7) + Var(rz: —7;), all t. 
Similarly, under (11.42), (xi; — x;) and (ux — @;) are uncorrelated for all t. Now 
El (xu T Xiru = ri)| =E[{(x} —x*) + (rit F Fi) Fi = ri)| = Var(rit 3 Ti). By the law of large 


numbers and the assumption of constant variances across t, 


N T T 
Nt ` pae ~%) > > Var(xin —X;) = T[Var(x% — x7) + Var(rin — 7] 


i=1 t=1 t=1 


and 
N T 
N > DiGi — Xi) [ui Ui Brit ri)| 5 TB . Var(ru — Ti). 
i=1 t1 
Therefore, 
work Var(ri = Ti) Var(ri = Ti) ) 
—= — Sooo = 1 ewe pig aph  e pa fa a Na . 
plimBre = B o( va ose Varro) ) o( [Var(xj, — x7) + Var(rir — ri) ] 


11.4. a. For each 7 we can average across ¢ and rearrange to get 
Ci = Vi — xB = Uj. 


Because E(v;) = 0, ue = E(c;) = EQ; — XiB). By the law of large numbers, 
N N 
Nt >a = N1! DiGi — xB) 4 Le. 
i=l i=l 


Now replace B with B rg and call the estimator ĝe: 


N N N 
Üe = N1 EG: = XB x) =N! DiGi — XB) - (on Ei Jn - P) 


i=1 i=1 


N N 
= N€ $ ci + 0,01) +001) = N€ $ ci + op) È pe, 


i=1 i=1 
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where we use NV! ys x; = O,(1) (by the law of large numbers — see Lemma 3.2) and 
Bip — B = 0,(1). 

b. There is more than one way to estimate 4e, but a simple approach is to first difference, 
giving 

AYVit = Zi + XB + Auy,t = 2,...,T. 

Then we can estimate B by fixed effects on the first differences (using ¢ = 2,...,7), and then 
apply the estimator from part a to the first differenced data. This means we just replace y; with 
Ay, and x; with Ax; everywhere (and the time averages are based on 7 — 1, not T, time periods). 

11.5. a. E(v;|z;,x;) = Z,;[E(aj|z:,x;) — a] + E(u,|z;,x;) = Zi(a-a) + 0 = 0. Next, 


Var(v;|Zi, X;) = Z;Var(ajlz;,x;)Z, + Var(u;|z;, X;) + Cov(a;, u,|Z;, X;) + Cov(u;, a;|Z;, X;) 
= Z;Var(a;|z;,x;)Z; T Var(u;|z;, X;) 


because a; and u; are uncorrelated, conditional on (z;,x;), by Assumption FE.1' and the usual 
iterated expectations argument, 
Var(v,|z;,x;) = ZAZ; + o2Ir. 

Therefore, under the assumptions given, which shows that the conditional variance depends on 
z;. Unlike in the standard random effects model, there is conditional heteroskedasticity. 

b. If we use the usual RE analysis, we are applying FGLS to the equation 
Y; = Zia + XiB+v;, where v; = Z;(a; — a) + u;. From part a, we know that E(v;|x;,z;) = 0, 
and so the usual RE estimator is consistent (as N — oo for fixed T) and yN -asymptotically 
normal, provided the rank condition, Assumption RE.2, holds. (Remember, a feasible GLS 
analysis with any Ô will be consistent provided Ô converges in probability to a nonsingular 


matrix as N — œ. It need not be the case that Var(v;|X;, zi) = plim(Q), or even that 
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Var(vi) = plim(Q). 

From part a, we know that Var(v,|x;,z;) depends on z; unless we restrict almost all 
elements of A to be zero (all but those corresponding to the constant in z;;). Therefore, the 
usual random effects inference — that is, based on the usual RE variance matrix estimator — will 
be invalid. 

c. We can easily make the RE analysis fully robust to an arbitrary Var(v;|x;,z;), as in 
equation (7.52). Naturally, we expand the set of explanatory variables to (zi, xi), and we 
estimate a along with B. 

11.6. No. Assumption (11.42) maintains strict exogeneity of {w},} in (11.41), and strict 
exogeneity clearly fails when wł = yj/-4. 

11.7. When A; = A/T for all ¢, we can rearrange (11.6) to get 


Vie = XePtwtXAtri,t = 1,2,...,T. 


Let B (along with d) denote the pooled OLS estimator from this equation. By standard results 
on partitioned regression [for example, Davidson and MacKinnon (1993, Section 1.4)], B can 
be obtained by the following two-step procedure: 


(i) Regress x, on X; across all ¢ and i, and save the 1 x K vectors of residuals, say &,,, 
(ii) Regress y; on ĝ, across all ¢ and i. The OLS vector on &,, is B. 


We want to show that B is the FE estimator. Given that the FE estimator can be obtained by 


pooled OLS of yx on (Xx — X;), it suffices to show that &,, = xi — X; for all ¢ and i. But, 
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and 


N T N T N N T 

=/= =/ l= =/= 
J J XXu = > X; > Xg = > TX; Xu = > > X;X; 
i=l t1 i=1 t=1 i=1 i=1 t1 


It follows that $, = xx — X;Ix = Xy — X;. This completes the proof. 

11.8. a. This is just a special case of Problem 8.8, where we now apply the results to the FD 
equation and account for the loss of the first time period. The rank condition is 
rank( X, E(zi,Axi) ) =K. 

b. Again, Problem 8.8 provides the answer. Letting e; = Aug, t > 2, two sufficient 
conditions are Var(ej\zi,) = 02, t = 2,..., T and E(eilZi, eim, ...,Z2,€2) = 0, t = 2,...,T. 

c. As in the case of pooled OLS after first differencing, this is only useful (and can only be 
implemented) when T > 3. First, estimate equation (11.100) by pooled 2SLS and obtain the 


residuals, e,t = 2,...,7,i = 1,...,N. Then, estimate the augmented equation, 
Ayit = AxitB + péit1 + errori t = 3,...,T 
by pooled 2SLS, using IVs (zit, é;:-1). If we strengthen the condition from part b to 
E(eilZiz, AXi¢-1, €it-1, --- , Zi2, AX2,€2) = 0 

then, under Ho, the usual ¢ statistic on é;;-1 is distributed as asymptotically standard normal, 
provided we add a dynamic homoskedasticity assumption. See Problem 8.10 for verification in 
a general IV setting. 

11.9. a. We can apply Problem 8.8.b because we are applying pooled 2SLS — this time to 
the time-demeaned equation. Therefore, the rank condition is 


T 
rank (= Baa ) =K. 


t=1 
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The rank condition clearly fails if x; contains any time-constant explanatory variables (across 
all i, as usual). The condition rank(0 7, E(&i/2i) ) = L also should be assumed, and this rules 
out time-constant instruments (and perfectly collinear instruments). If the rank condition holds, 
we can always redefine z;; so that DING E(Z,,Zir) has full rank. 

b. We can apply the results on GMM estimation in Chapter 8. In particular, in equation 
(8.25), take C = E(Z;X;), W = [E(Z,Z)] 1, A = E(Ziiiii}Z,). 

A key point is that Zit; = (Q,Z:)' (Qui) = Z,QWi= Zu;, where Q,is the Tx T 
time-demeaning matrix defined in Chapter 10. Under Assumption FEIV.3, E(u jw|Z,) =o7Ir 
(by the usual iterated expectations argument), and so A = CE(Z,Z i). If we plug these choices 
of C, W,and A into (8.29) and simplify, we obtain 

Avar VN (B - B) = o2{E(X,2)[B@Z)] EČ Š). 

c. The argument is very similar to the case of the fixed effects estimator. First, we already 
showed in Chapter 10 that ae E(ai2) = (T- 1)o2. If ii, = Yiu — XB are the pooled 2SLS 
residuals applied to the time-demeaned data, then [N(T — 1)]~! ae Dee i, is a consistent 
estimator of o2. Typically, N(T — 1) would be replaced by N(T — 1) — K as a degrees of 
freedom adjustment. 

d. From Problem 5.1 — which is purely algebraic, and so applies directly to pooled 2SLS, 
even with lots of dummy variables — the 2SLS estimates, including B, can be obtained as 
follows. First, run the regression x; on d1;, ..., dNj, Zi across all ¢ and i, and obtain the 
residuals, say Tit. Second, obtain ¢; ,..., ĈN, 6 from pooled regression yx on d1;, ..., dNi, Xit 
Ti. Now, by algebra of partial regression, B and the coefficient on Fx, say 5, from this last 


regression can be obtained by first partialling out the dummy variables, d1;, ..., dN;. As we 
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know from Chapter 10, this partialling out is equivalent to time demeaning all variables. 
Therefore, B and 6 can be obtained form the pooled regression Yir on Xi, fi, where we use the 
fact that the time average of fx for each i is identically zero. 


Now consider the 2SLS estimator of B from 
Vie = Xup + üu (11.102) 
using IVs Z;. Again appealing to Problem 5.1, the pooled 2SLS estimator can be obtained from 
regressing Xj on Zi and saving the residuals, say $a, and then running the OLS regression Vir 
on Xj, $i. By partial regression and the fact that regressing on d1;, ..., dN; results in time 


demeaning, $; = Fi for all i and t. This proves that the 2SLS estimates of B from (11.102) and 
Vie = €1d1; + C2d2; +...+endN; + Xup + uit (11.103) 


are identical. 
e. By writing down the first order condition for the 2SLS estimates from (11.103) (with dn; 
as their own instruments, and Xj; as the IVs for xx), it is easy to show that ĉ; = Y; — X iB, where 


A 


P is the FE2SLS estimator. Therefore, the 2SLS residuals from (11.103) are computed as 
Vi — Ĉi —XiB = ya — Gi — KB) —x,B = Wu =) — Kir — KB = Hu — KiB, 
which are exactly the 2SLS residuals from (11.102). Because the X dummy variables are 
explicitly included in (11.103), the degrees of freedom in estimating oł from part c are 
properly calculated. 
The general, messy estimator in equation (8.31) should be used, where X and Z are 
replaced with X and Z, respectively, W = (Z'ZIN) a ii; = ¥-Xip, and 


oo [A A! oe 


A=N1D™ iii Zi. 
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11.10. Let 4;, i = 1,...,N, and B be the OLS estimates from the pooled OLS regression 
(11.101). By partial regression, B can be obtained by first regressing ya on d1j;Zir, d2iZitn ..., 
dN,z, and obtaining the residuals, y; and likewise for X;. Then, we regress ïi on Xi, 
t=1,...,7; i = 1,...,N. But regressing on d1;zi, d2izi, ..., ANiZi across all t and i is the 
same as regressing on Zi, t = 1,...,7, for each cross section observation, i. Therefore, we can 
write 

Vie = Vie — Zul (ZjZi)'Ziy,] 

ï; = Myy; 
where M; = Ir - Z;[(Z}Z;)*Z;. A similar expression holds for ¥;;. We have shown that 
regression (11.101) is identical to the pooled OLS regression Ÿ on Xiz, 
t=1,...,7,i = 1,...,N. The residuals from the two regressions are exactly the same by the 
two-step projection result. The regression in (11.101) results in NT — NJ- K = NM(T-J)-K 
degrees of freedom, which is exactly what we need in (11.76). 

11.11. Differencing twice and using the resulting cross section is easily done in most 
statistical packages. Alternatively, Equivalently, use FE on the FD equation (which is the same 
as FD on the FD equation).I can use fixed effects on the first differences 

The Stata output follows. The estimates from the random growth model are pretty bad: the 
estimates on the grant variables are of the “wrong” sign, and they are very imprecise. 

The joint F test for the 53 different firm intercepts (when we treat the heterogeneity as 
estimable parameters) is significant at the 5% level(p-value = .033), which does suggest a 
random growth model is appropriate. (But remember, this statistic is only valid under 
restrictive assumptions.) It is hard to know what to make of the poor estimates, but it does cast 


doubt on the standard unobserved effects model without a random growth term. 
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xtreg clscrap d89 cgrant cgrant_1, fe 


Fixed-effects (within) regression Number of obs = 108 
Group variable: fcode Number of groups = 54 
R-sq: within = 0.0577 Obs per group: min = 
between = 0.0476 avg = 2. 
overall = 0.0050 max = 
F(3,51) = 1.04 
corr(u_i, Xb) = -0.4011 Prob > F = 0.3826 
clscrap | Coef Std. Err t P>|t | [95% Conf. Interval 
val ae im i a +--------------------------------------------------------------- 
d89 | -.2377384 . 1407362 -1.69 0.097 - .5202783 . 0448014 
cgrant | .1564748 . 2632934 0.59 0.555 - .3721088 .6850584 
cgrant_1 | -6099015 . 6343411 0.96 0.341 - .6635913 1.883394 
_cons | -.2240491 .114748 -1.95 0.056 - . 4544153 .0063171 
Jia i a S RT i A S a r ‘i +--------------------------------------------------------------- 
sigma_u | .50956703 
sigma_e | .49757778 
rho | .51190251 (fraction of variance due to u_i) 


Prob > F = 0.0334 
11.12. a. Using only the changes from 1990 to 1993 and estimating the first-differenced 


equation by OLS gives: 


reg cmrdrte cexec cunem if d93 


Source | SS df MS Number of obs = 51 
es ee ee F( 2, 48) = 2.96 
Model | 6.8879023 2 3.44395115 Prob > F = 0.0614 
Residual | 55.8724857 48 1.16401012 R-squared = 0.1097 
-------------+------------------------------ Adj R-squared = 0.0727 
Total | 62.760388 50 1.25520776 Root MSE = 1.0789 
cmrdrte | Coef Std. Err t P>|t | [95% Conf. Interval 
et ee i a i ee «ee +--------------------------------------------------------------- 
cexec | -.1038396 . 0434139 -2.39 0.021 -.1911292 -.01655 
cunem | -.0665914 . 1586859 -0.42 0.677 - . 3856509 . 252468 
cons | .4132665 . 2093848 1.97 0.054 - .0077298 . 8342628 


The coefficient on cexec means that one more execution reduces the murder rate by about 
.10, and the effect is statistically significant. 
b. If executions in the future respond to changes in the past murder rate, then exec may not 


be strictly exogenous. If executions more than three years ago have a partial effect on the 
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murder rate, this would also violate strict exogeneity because, effectively, we do not have 
enough lags. In principle, we could handle the latter problem by collecting more data and 


including more lags. 


c. To test the rank condition, we regress Aexec;; on 1, Aexecjy1, Aunem; for 1993, and do a 


t test on Aexecit-1: 


reg cexec cexec_1 cunem if d93 


Source | SS df MS Number of obs = 51 
So Semen uaees aS Poeg Se ehes eee Sweetie oe eee F( 2, 48) = 20.09 
Model | 281.429488 2 140.714744 Prob > F = 0.0000 
Residual | 336.217571 48 7.00453273 R-squared = 0.4556 
------------- +------------------------------ Adj R-squared = 0.4330 
Total | 617.647059 50 12.3529412 Root MSE = 2.6466 

cexec | Coef Std. Err t P>|t | [95% Conf. Interval 

G oo i a m n i i a a ar, a +--------------------------------------------------------------- 
cexec_1 | -1.08241 .1707822 -6.34 0.000 -1.42579 - . 7390289 
cunem | .0400493 . 3892505 0.10 0.918 - . 7425912 .8226898 
cons | . 3139609 .5116532 0.61 0.542 -. 7147868 1.342709 


Interestingly, there is a one-for-one negative relationship between the change in lagged 
executions and the change in current executions. Certainly the rank condition passes. 


The IV estimates are below: 
reg cmrdrte cexec cunem (cexec_1 cunem) if d93 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 51 
Diesen fan enn en ene E s F( 2, 48) = 1.31 
Model | 6.87925253 2 3.43962627 Prob > F = 0.2796 
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Residual 


55.8811355 


62.760388 


48 1.16419032 


50 1.25520776 


R-squared 
Adj R-squared 
Root MSE 


- .1000972 
- .0667262 
. 410966 


Std. Err t P>|t | 
. 0643241 -1.56 0.126 
.1587074 -0.42 0.676 
. 2114237 1.94 0.058 


- .2294293 
- . 3858289 
- .0141298 


.029235 
. 2523764 
. 8360617 


The point estimate on Aexec is essentially the same as the OLS estimate, but, of course, the 


IV standard error is larger. We can justify the POLS estimator on the FD equation (as the null 


of exogeneity of Aexec would not be rejected). 


d. The following Stata command gives the results without Texas: 


Number of obs 
F( 2, 47) 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


- .2785288 
- .3926569 
- .0125233 


Number of obs 
F( 2, 
Prob > F 
R-squared 
Adj R-squared 
Root MSE 


47) 


-1.535436 
- . 4388895 


. 1435868 
. 2525936 
.8375686 


1.699902 
. 2735624 


reg cmrdrte cexec cunem if (d93==1 & state!= "TX") 
Source | SS df MS 
Model | .755191109 2 .377595555 
Residual | 55.7000012 47 1.18510641 
Total | 56.4551923 49 1.15214678 
cmrdrte | Coef Std. Err t P>|t | 
eae ke Si a ag i me al, ek +--------------------------------------------------------------- 
cexec | -.067471 . 104913 -0.64 0.523 
cunem | -.0700316 . 1603712 -0.44 0.664 
cons | .4125226 .2112827 1.95 0.057 
Instrumental variables (2SLS) regression 
Source | SS df MS 
Model | -1.65785462 2 -.828927308 
Residual | 58.1130469 47 1.23644781 
Total | 56.4551923 49 1.15214678 
cmrdrte | Coef Std. Err t P>|t | 
So er a Vie me pel, Fein Sey hn +--------------------------------------------------------------- 
cexec | .082233 . 804114 0.10 0.919 
cunem | -.0826635 .17 70735 -0.47 0.643 
cons | . 3939505 .2373797 1.66 0.104 


- .0835958 


.8714968 


The OLS estimate is smaller in magnitude and not statistically significant, while the IV 
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estimate actually changes sign (but, statistically, is not different from zero). Clearly, including 


Texas in the estimation has a big impact. It is easy to see why this is the case by listing the 


change in the murder rates and executions for Texas along with the averages for all states: 


. list cmrdrte cexec if (d93==1 & state =="TX") 


Variable | Obs Mean Std. Dev Min 

il es", eth ae daa phe a ae +-------------------------------------------------------- 
cmrdrte | 51 . 2862745 1.120361 -2.200001 
cexec | 51 .6470588 3.514675 -3 


Texas has the largest drop in the murder rate from 1990 to 1993, and also the largest 


3.099998 
23 


increase in the number of executions. This does not necessarily mean Texas is an outlier, but it 


clearly is an influential observation. And it is clear why including Texas makes for a fairly 


strong deterrent effect. 


11.13. a. The following Stata output estimates the reduced form for Alog(pris) and tests 


joint significance of final1 and final2, and also tests equality of the coefficients on final1 and 


final2. The latter is actually not very interesting Technically, because we do not reject, we 


could reduce our instrument to final1 + final2, but we could always look ex post for restrictions 


on the parameters in a reduced form. 
use prison 


. xtset state year 
panel variable: state (strongly balanced) 
time variable: year, 80 to 93 
delta: 1 unit 


reg gpris final1 final2 gpolpc gincpc cunem cblack cmetro cag0_14 cag1i5_17 


cag18_24 cag25_34 y81-y93 


Source | SS df MS Number of obs 
wenn nn eee e-e- panne een ne nee eee F( 24, 
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689) 


714 
5.15 


Model | 
Residual | 


. 481041472 


3.16110778 


24 „020043395 
2.68006631 689 .003889791 


713 .004433531 


Prob > F 
R-squared 

Adj R-squared 
Root MSE 


finalı | 
final2 | 
gpolpc | 
gincpc | 
cunem | 
cblack | 
cmetro | 
cag@_14 | 
cagi5_17 | 
cagi8_24 | 
cag25_34 | 
y81 | 

y82 | 
| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 

| 


- .077488 
- .0529558 
- .0286921 

. 2095521 

. 1616595 
- .0044763 
-1.418389 

2.617307 
-1.608738 

. 9533678 
-1.031684 

.0124113 

.0773503 

.0767785 

.0289763 

.0279051 

. 0541489 

.0312716 

.019245 

.0184651 

. 0635926 

.0263719 

.0190481 

.0134109 

.0272013 


.0259556 
.0184078 
. 0440058 
. 1313169 
. 3111688 
.0262118 
. 7860435 
1.582611 
3.755564 
1.731188 
1.763248 

.013763 
.0156924 
.0153929 
.0176504 
.0164176 
.0179305 
.0171317 
.0170725 
.0172867 
.0165775 
.0168913 
.0179372 
.0189757 
.0170478 


-0. 


- . 1284496 
- .0890979 
- .1150937 
- .0482772 
- .4492935 
- .055941 
-2.961717 
- .4900126 
-8.982461 
-2.445669 
-4.493667 
- .0146111 
. 0465396 
.0465559 
- .0056787 
- .0043295 
.018944 

- .002365 
- .0142754 
- .0154759 
. 0310442 
- .0067927 
- .0161701 
- .0238461 
- .0062705 


- .0265265 
- .0168136 
.0577094 
. 4673815 
. 7726124 
.0469883 
. 1249393 
5.724627 
5.764986 
4.352405 
2.4303 

. 0394337 
.108161 
.1070011 
. 0636314 
.0601397 
. 0893539 
.0649082 
.0527654 
.052406 
.0961411 
. 0595366 
.0542663 
.050668 
.0606731 


test finalı final2 


( 1) finali = 0 
( 2) final2 = 0 
F( 2, 689) = 
Prob > F = 
test final1 = final2 
( 1) finali - final2 = 
F( 1, 689) = 
Prob > F = 


0.60 
0.4401 


Jointly, final1 and final2 are pretty significant. Next, test for serial correlation in 


dit = AV it: 


predict ahat, 


resid 


gen ahat_1 = l.ahat 
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(51 missing values generated) 


reg ahat ahat_1 


Source | SS df MS Number of obs = 663 
ee rene F( 1, 661) = 14.33 
Model | .051681199 1 .051681199 Prob > F = 0.0002 
Residual | 2.38322468 661 .003605484 R-squared = 0.0212 
-------------+------------------------------ Adj R-squared = 0.0197 
Total | 2.43490588 662 .003678106 Root MSE = 06005 

ahat | Coef Std. Err t P>|t | [95% Conf. Interval 

Ge ee a i ee ee +--------------------------------------------------------------- 
ahat_1 | . 1426247 .0376713 3.79 0.000 . 0686549 . 2165945 
_cons | 4.24e-11 . 002332 0.00 1.000 - .004579 .004579 


There is strong evidence of positive serial correlation, although the estimated size of the 


AR(1) coefficient, . 143, is not especially large. Still, a fully robust variance matrix should be 


used for the joint significance test of final1 and final2. These two IVs are much more 


significant when the robust variance matrix is used: 


qui reg gpris finalı final2 gpolpc gincpc cunem cblack cmetro cag0_14 
cag15_17 cag18_24 cag25_34 y81-y93, cluster(state) 


test finalı final2 


( 1) finali = 0 
( 2) final2 = 0 
F( 2, 50) = 18.82 
Prob > F = 0.000 


b. First, we do pooled 2SLS to obtain the 2SLS residuals, ĉ;. Then we add the lagged 


residual to the equation, and use it as its own IV: 


ivreg gcriv gpolpc gincpc cunem cblack cmetro cag@_14 cagi5_17 cagi8_24 
cag25_34 y81-y93 (gpris = final1 final2) 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 714 
ee nna F( 23, 690) = 6.08 
Model | -.696961613 23 - .030302679 Prob > F = 0.0000 
Residual | 6.28846843 690 .009113722 R-squared = 
-------------+------------------------------ Adj R-squared = 
Total | 5.59150682 713 .007842226 Root MSE = ,.09547 
gcriv | Coef Std. Err. t P>|t | [95% Conf. Interval 
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gpris | -1.031956 . 3699628 -2.79 0.005 -1.758344 - . 3055684 
gpolpc | . 035315 .0674989 0.52 0.601 - .0972128 . 1678428 
gincpc | . 9101992 . 2143266 4.25 0.000 . 4893885 1.33101 

cunem | . 5236958 . 4785632 1.09 0.274 - .415919 1.46331 
cblack | -.0158476 .0401044 -0.40 0.693 - .0945889 . 0628937 
cmetro | - .591517 1.298252 -0.46 0.649 -3.140516 1.957482 

cag0_14 | 3.379384 2.634893 1.28 0.200 -1.793985 8.552753 
cagi5_17 | 3.549945 5.766302 0.62 0.538 -7.771659 14.87155 
cagi8_24 | 3.358348 2.680839 1.25 0.211 -1.905233 8.621929 
cag25_34 | 2.319993 2.706345 0.86 0.392 -2.993667 7.633652 

y81 | -.0560732 0217346 -2.58 0.010 - .0987471 - .0133992 
y82 | .0284616 . 0384773 0.74 0.460 - .047085 . 1040082 
y83 | .024703 .0373965 0.66 0.509 - .0487216 .0981276 
y84 | .0128703 . 0293337 0.44 0.661 - .0447236 .0704643 
y85 | .0354026 .0275023 1.29 0.198 - .0185956 . 0894008 
y86 | .0921857 . 0343884 2.68 0.008 .0246672 .1597042 
y87 | .004771 .0290145 0.16 0.869 - .0521964 . 0617383 
y88 | . 0532706 .0273221 1.95 0.052 - .0003738 . 106915 
y89 | . 0430862 0275204 1.57 0.118 - .0109476 .0971201 
y90 | . 1442652 . 0354625 4.07 0.000 .0746379 . 2138925 
y91 | .0618481 .0276502 2.24 0.026 .0075595 . 1161366 
y92 | 0266574 0285333 0.93 0.350 - .0293651 0826799 
y93 | 0222739 0296099 0.75 0.452 - .0358624 0804103 
cons | 0148377 0275197 0.54 0.590 - .0391948 0688702 


Instrumented: gpris 

Instruments: gpolpc gincpc cunem cblack cmetro cag0_14 cag15_17 cag18_24 
cag25_34 y81 y82 y83 y84 y85 y86 y87 y88 y89 y90 y91 y92 y93 
finalı final2 


predict ehat, resid 


gen ehat_1 = l.ehat 
(51 missing values generated) 


ivreg gcriv gpolpc gincpc cunem cblack cmetro cag@_14 cagi5_17 cagi8_24 
cag25_34 y81-y93 ehat_1 (gpris = finalı final2) 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 663 
Sh eae Gee ap ese E yee Seana aoa Al F( 23, 639) = 5.14 
Model | -.815873465 23 -.035472759 Prob > F = 0.0000 
Residual | 5.90425699 639 .009239839 R-squared = 
-------------+------------------------------ Adj R-squared = 
Total | 5.08838353 662 . 00768638 Root MSE = ,.09612 
gcriv | Coef Std. Err t P>|t | [95% Conf. Interval 
Se owt aah e te a ps lee py T G +--------------------------------------------------------------- 
gpris | -1.084446 .4071905 -2.66 0.008 -1.884039 - . 2848525 
gpolpc | -0179121 .0719595 0.25 0.804 - .1233935 .1592176 
gincpc | . 7492611 - 2421405 3.09 0.002 . 2737738 1.224748 
cunem | .1979701 .515973 0.38 0.701 - .8152375 1.211178 
cblack | -.0102865 . 0424589 -0.24 0.809 - .0936622 .0730893 
cmetro | -.5272326 1.357715 -0.39 0.698 -3.193354 2.138889 
cag@_14 | 3.284496 3.045539 1.08 0.281 -2.695979 9.26497 
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cagi5_17 | .066451 6.105497 0.01 0.991 -11.92281 12.05571 
cag18_24 | 3.094998 2.830038 1.09 0.275 -2.462301 8.652297 
cag25_34 | 2.716353 2.799581 0.97 0.332 -2.781137 8.213843 
y81 | -.0782703 .0350721 -2.23 0.026 -.1471409 - .0093998 
y82 | - 0090276 0225246 0.40 0.689 - .0352036 . 0532588 
y83 | (dropped) 
y84 | -.0113602 . 0314408 -0.36 0.718 -.0731 .0503796 
y85 | .015744 .0309473 0.51 0.611 - .0450267 .0765148 
y86 | .0752485 .027649 2.72 0.007 .0209547 .1295424 
y87 | -.0205808 . 0282106 -0.73 0.466 - .0759774 . 0348159 
y88 | .0265964 .0315542 0.84 0.400 - .0353661 .0885589 
y89 | .0182293 .0327158 0.56 0.578 - .0460142 .0824727 
y90 | .1275351 .0235386 5.42 0.000 .0813126 .1737575 
y91 | . 0435859 .0315328 1.38 0.167 - .0183346 . 1055064 
y92 | .0121958 .0354112 0.34 0.731 - .0573406 .0817321 
y93 | .0016107 .0365807 0.04 0.965 -.0702221 . 0734435 
ehat_1 | .0763754 .0456451 1.67 0.095 - .0132571 . 166008 
cons | .0441747 .047 7902 0.92 0.356 - .0496701 . 1380195 


Instrumented: gpris 

Instruments: gpolpc gincpc cunem cblack cmetro cag0_14 cag15_17 cag18_24 
cag25_34 y81 y82 y83 y84 y85 y86 y87 y88 y89 y90 y91 y92 y93 
ehat_1 finalı final2 


There is only marginal evidence of positive serial correlation, and it is practically small, 
anyway (p =.076). 

c. Adding a state effect to the change (FD) equation changes very little. In this example, 
there seems to be little need for a random growth model. The estimated prison effect becomes 


a little smaller in magnitude, —.959. Here is the Stata output: 


xtivreg gcriv gpolpc gincpc cunem cblack cmetro cag@_14 cag1i5_17 cagi8_24 
cag25_34 y81-y93 (gpris = finali final2), fe 


Fixed-effects (within) IV regression Number of obs = 714 
Group variable: state Number of groups = 51 
R-sq: within = 3 Obs per group: min = 14 
between = 0.0001 avg = 14. 
overall = 0.1298 max = 14 
Wald chi2(23) = 179.24 
corr(u_i, Xb) = -0.2529 Prob > chi2 = 0.0000 
gcriv | Coef Std. Err Z P>|z | [95% Conf. Interval 
as ems ll ps a lh“ la +--------------------------------------------------------------- 
gpris | -.9592287 . 3950366 -2.43 0.015 -1.733486 - .1849713 
gpolpc | .04445 .0664696 0.67 0.504 - .0858281 .1747281 
gincpc | 1.027161 . 2157944 4.76 0.000 - 6042122 1.450111 
cunem | - 6560942 . 4698359 1.40 0.163 - .2647672 1.576956 
cblack | .0706601 . 1496426 0.47 0.637 - .2226339 . 3639542 
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cmetro | 3.229287 4.683812 0.69 0.491 -5.950815 12.40939 
cag@_14 | 1.14119 2.749679 0.42 0.678 -4.248082 6.530462 
cag15_17 | 1.402606 6.330461 0.22 0.825 -11.00487 13.81008 
cag18_24 | 1.169114 2.866042 0.41 0.683 -4.448225 6.786453 
cag25_34 | -2.089449 3.383237 -0.62 0.537 -8.720471 4.541574 
y81 | -.0590819 .0230252 -2.57 0.010 - .1042104 - .0139534 
y82 | . 0033116 . 0388056 0.09 0.932 -.0727459 .0793691 
y83 | . 0080099 .0378644 0.21 0.832 - .066203 .0822228 
y84 | -.0019285 .0293861 -0.07 0.948 - .0595243 .0556672 
y85 | .0220412 .0276807 0.80 0.426 - .032212 .0762945 
y86 | .075621 . 0338898 2.23 0.026 . 0091981 . 1420438 
y87 | -.0124835 .0294198 -0.42 0.671 - .0701453 .0451783 
y88 | .0329977 .0286125 1.15 0.249 - .0230817 .0890771 
y89 | .018718 0292666 0.64 0.522 - .0386434 .0760794 
y90 | .1157811 . 0354143 3.27 0.001 .0463703 .1851919 
y91 | .0378784 .0290414 1.30 0.192 - .0190417 .0947984 
y92 | -.0006633 . 0305014 -0.02 0.983 - .0604449 .0591184 
y93 | -.0007561 0317733 -0.02 0.981 - .0630306 . 0615184 
cons | .0014574 0296182 0.05 0.961 - .0565932 .0595079 
See oat eet te eh pn Ta a a ie +--------------------------------------------------------------- 
sigma_u | .03039696 
sigma_e | .0924926 
rho | .09747751 (fraction of variance due to u_i) 
F test that all u_i=0: F(50,640) = 0.69 Prob > F = 0.9459 
Instrumented: gpris 
Instruments: gpolpc gincpc cunem cblack cmetro cag0_14 cagi5_17 cag18_24 


cag25_34 y81 y82 y83 y84 y85 y86 y87 y88 y89 


y90 y91 y92 y93 finalı final2 


d. When we use the property crime rate, the estimated elasticity with respect to prison size 


is substantially smaller, but still negative and marginally significant: 


ivreg gcrip gpolpc gincpc cunem cblack cmetro cag@_14 cagi5_17 cagi8_24 
cag25_34 y81-y93 (gpris = final1 final2) 


Instrumental variables (2SLS) regression 


Source | SS df MS 


Model | 1.07170564 23 .046595897 
Residual | 1.5490539 690 .002245006 


Total | 2.62075954 713 . 00367568 


Number of obs = 


F( 23, 690) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


ll 
N 
N 
00 
wo 


gcrip | Coef Std. Err t 

gpris | -.3285567 .1836195 -1.79 
gpolpc | .014567 .033501 0.43 
gincpc | .0560822 .1063744 0.53 
cunem | . 8583588 .2375199 3.61 
cblack | -.0507462 .0199046 -2.55 
cmetro | . 0404892 . 6443472 0.06 
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- .6890768 
- .051209 
-.1527741 
. 3920102 
- .089827 
-1.224627 


.0319633 
. 0803431 
. 2649385 
1.324707 
- .0116654 
1.305606 


cag@_14 | 1.890526 1.307747 1.45 0.149 -6771151 4.458167 
cagi5_17 | 5.699448 2.861925 1.99 0.047 - 0803221 11.31857 
cagi8_24 | 1.712283 1.330551 1.29 0.199 . 9001312 4.324698 
cag25_34 | 2.027833 1.34321 1.51 0.132 . 6094366 4.665102 

y81 | -.0771684 .0107873 -7.15 0.000 . 0983483 -0559886 
y82 | -.0980884 .019097 -5.14 0.000 . 1355836 - 0605932 
y83 | -.1093989 .0185606 -5.89 0.000 . 1458409 .0729569 
y84 | -.0810119 .0145589 -5.56 0.000 . 1095968 .0524269 
y85 | - .031369 .0136499 -2.30 0.022 .0581693 . 0045687 
y86 | -.0169451 .0170676 -0.99 0.321 .0504558 .0165656 
y87 | -.0310865 .0144005 -2.16 0.031 .0593605 .0028125 
y88 | -.0437643 .0135605 -3.23 0.001 .0703891 .0171396 
y89 | -.0359254 .0136589 -2.63 0.009 .0627434 .0091074 
y90 | -.0298029 .0176007 -1.69 0.091 .0643603 .0047544 
y91 | -.0505269 .0137233 -3.68 0.000 .0774713 .0235824 
y92 | -.1024579 .0141616 -7.23 0.000 . 1302629 .0746529 
y93 | -.0867254 .014696 -5.90 0.000 .1155796 .0578712 
cons | .0857682 .0136586 6.28 0.000 .0589509 .1125856 
Instrumented: gpris 
Instruments: gpolpc gincpc cunem cblack cmetro cag0_14 cag15_17 cag18_24 
cag25_34 y81 y82 y83 y84 y85 y86 y87 y88 y89 y90 y91 y92 y93 
finalı final2 
The test for serial correlation yields a coefficient on é;;-1 of —.024 (t = —. 52), and so we 


conclude that serial correlation is not an issue. 


11.14. a. The fixed effects estimate of the first-difference equations are given below. We 


have included year dummies without differencing them, since we are not interested in the time 


effects in the original model: 


use ezunem 


xtset city year 
panel variable: 
time variable: 
delta: 


gen cezt = d.ezt 
(22 missing values generated) 


year, 
1 unit 


xtreg guclms cez cezt d82-d88, fe 


Fixed-effects (within) regression 
Group variable: city 


R-sq: within 
between 
overall 


corr(u_i, Xb) 


0.6406 
0.0094 
0.6205 


= -0.0546 
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city (strongly balanced) 
1980 to 1988 


Number of obs 
Number of groups 


Obs per group: min 


F(9,145) 
Prob > F 


avg 
max 


176 


oe al eg eee! pk Fi ae a Sei +--------------------------------------------------- 
cez | . 1937324 . 3448663 0.56 0.575 - .4878818 
cezt | -.0783638 .0679161 -1.15 0.250 - .2125972 
d82 | . 7787595 .0675022 11.54 0.000 . 6453442 
d83 | -.0331192 .0675022 -0.49 0.624 -.1665345 
d84 | -.0127177 .0713773 -0.18 0.859 -.153792 
d85 | . 3616479 .0762138 4.75 0.000 . 2110144 
d86 | 3277739 0742264 4.42 0.000 1810684 
d87 | 089568 0742264 1.21 0.230 - .0571375 
d88 | 0185673 0742264 0.25 0.803 - .1281381 
cons | -.3216319 .0477312 -6.74 0.000 - .4159708 
ee ere ee oe ce +--------------------------------------------------- 
sigma_u | .05880562 
sigma_e | .22387933 
rho | .06454083 (fraction of variance due to u_i) 
F test that all u_i=0: F(21, 145) = 0.49 


test cez cezt 


( 1) cez=0 
( 2) cezt=0 
F( 2, 145) = 3.22 
Prob > F = 0.0428 


The coefficient ô> = —.078 gives the difference in annual growth rate due to EZ 


.8 753467 
.0558697 
.9121748 
.1002961 
. 1283566 
.5122814 
.47441793 
. 2362735 
. 1652728 
-.2272931 


Prob > F = 0.9712 


designation. It is not significant at the usual 5% level. Note that this formulation does not give 


the coefficient ô; a simple interpretation because zone designation happened either at ¢ = 5 (if 


in 1984) or ¢ = 6 (if in 1985). A better formulation centers the linear trend at the time of 


designation before constructing the interactions: 
egen nyrsez =sum(ez), by(city) 


gen eztO = 0 if ~ez 
(46 missing values generated) 


replace eztO = ez*(t-5) if nyrsez == 5 
(30 real changes made) 


replace eztO = ez*(t-6) if nyrsez = 4 
(16 real changes made) 


gen ceztO = eztO - eztO[_n-1] if year > 1980 
(22 missing values generated) 


xtreg guclms cez cezt® d82-d88, fe 


Fixed-effects (within) regression Number of obs 
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Group variable: city Number of groups 


22 


- .0515252 
.058659 
.9121716 
. 1002928 
. 1395256 
.5014621 
. 4718922 
. 2396864 
. 1686857 
- .2272953 


Prob > F = 0.9681 


R-sq: within = 0.6406 Obs per group: min = 
between = 0.0025 avg = 
overall = 0.6185 max = 

F(9,145) = 
corr(u_i, Xb) = -0.0630 Prob > F = 
guclms | Coef Std. Err t P>|t | 

semi ily Yh = ht“ +--------------------------------------------------------------- 

cez | -.2341545 .0924022 -2.53 0.012 - .4167837 
cezto | - .082805 .0715745 -1.16 0.249 - .224269 
d82 | . 7787595 .0675005 11.54 0.000 . 6453475 
d83 | -.0331192 .0675005 -0.49 0.624 - . 1665312 
d84 | -.0028809 .0720513 -0.04 0.968 - .1452874 
d85 | . 355169 .0740177 4.80 0.000 . 208876 
d86 | . 3297926 .0749318 4.40 0.000 . 181693 
d87 | .0915867 .0749318 1.22 0.224 - .0565129 
dss | -0205861 .0749318 0.27 0.784 -.1275135 
cons | -.3216319 .0477301 -6.74 0.000 - .4159685 
E ap a +--------------------------------------------------------------- 
sigma_u | .06091433 
sigma_e | .22387389 
rho | .06893091 (fraction of variance due to u_i) 
F test that all u_i=0: F(21, 145) = 0.50 


Now the coefficient on cez is the estimated effect of the EZ in the first year of designation, 


and that gets added to —. 083 + (years since initial designation). This is easier to read. 


b. Setting 0; = 0 gives a within R-squared of about .640, compared with that of the original 


model in Example 11.4 of about .637. The difference is minor, and we would probably go with 


the simpler, basic model in Example 11.4. With more years of data, the trend effect in part a 


might become significant. 


c. Because the general model contains c; + git, we cannot distinguish the effects of a 


time-constant variable, w;, or its interaction with a linear time trend — at least if we stay in a 


fixed effects framework. If we assume c; and g; are uncorrelated with ez; we could include w; 


and wt. 


d. Yes. Provided {ex : t = 1,..., Ty has the kind of variation that it does in this data set, 


w;eZir is linearly independent from other covariates included in the model. Therefore, we can 
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estimate 7. If we add hjezir to the model, where A; is additional unobserved heterogeneity, then 
n would not be identified (again, allowing A; to be correlated with ez;). 

11.15. a. We would have to assume that grant; is uncorrelated with the idiosyncratic 
errors, Uis, for all ¢ and s. One way to think of this assumption is that while grant designation 
may depend on firm heterogeneity c;, it is not related to idiosyncratic fluctuations in any time 
period. Further, one must assume the grants have an effect on scrap rates only through their 
effects on job training — the standard assumption for an instrument. 

b. The following simple regression shows that AArsemp i; and Agrant;; are highly positively 


correlated, as expected: 


reg chrsemp cgrant if d88 


Source | SS df MS Number of obs = 125 
ee rene F( 1, 123) = 79.37 
Model | 18117.5987 1 18117.5987 Prob > F = 0.0000 
Residual | 28077.3319 123 228.270991 R-squared = 0.3922 
-------------+------------------------------ Adj R-squared = 0.3873 
Total | 46194.9306 124 372.539763 Root MSE = 15.109 
chrsemp | Coef Std. Err. t P>|t | [95% Conf. Interval 

et aa a pat ta Tae os Jet +--------------------------------------------------------------- 
cgrant | 27.87793 3.129216 8.91 0.000 21.68384 34.07202 
_cons | . 5093234 1.558337 0.33 0.744 -2.57531 3.593956 


Unfortunately, this is on a bigger sample than we can use to estimate the scrap rate 


equation, because the scrap rate is missing for so many firms. Restricted to that sample, we get: 


reg chrsemp cgrant if d88 & clscrap ~= 


Source | SS df MS Number of obs = 45 
ee rene F( 1, 43) = 22.23 
Model | 6316.65458 1 6316.65458 Prob > F = 0.0000 
Residual | 12217.3517 43 284.124457 R-squared = 0.3408 
-------------+------------------------------ Adj R-squared = 0.3255 
Total | 18534.0062 44 421.227414 Root MSE = 16.856 
chrsemp | Coef Std. Err. t P>|t | [95% Conf. Interval 

Gta a ee ag ae Tet, es +--------------------------------------------------------------- 
cgrant | 24.43691 5.182712 4.72 0.000 13.98498 34.88885 
_cons | 1.580598 3.185483 0.50 0.622 -4.84354 8.004737 
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So there is still a pretty strong relationship, but we will be using IV on a small sample 
(N = 45). 


c. The IV estimate is: 
ivreg clscrap (chrsemp = cgrant) if d88 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 45 
ee rn F( 1, 43) = 3.20 
Model | .274951237 1 .274951237 Prob > F = 0.0808 
Residual | 17.0148885 43 .395695081 R-squared = 0.0159 
-------------+------------------------------ Adj R-squared = -0.0070 
Total | 17.2898397 44 .392950903 Root MSE = 62904 
clscrap | Coef Std. Err t P>|t | [95% Conf. Interval 
ee eee eee ere +--------------------------------------------------------------- 
chrsemp | -.0141532 .0079147 -1.79 0.081 - .0301148 .0018084 
_cons | -.0326684 .1269512 -0.26 0.798 - . 2886898 . 223353 


Instrumented: chrsemp 
Instruments: cgrant 


The estimate says that 10 more hours training per employee would lower the average scrap 
rate by about 14.2 percent, which is a large economic effect. It is marginally statistically 
significant (assuming we can trust the asymptotic distribution theory for IV with 45 
observations). 

d. The OLS estimates is only about —. 0076 — about half of the IV estimate — with 


t = —1.68. 


reg clscrap chrsemp if d88 


Source | SS df MS Number of obs = 45 
E rr een F( 1, 43) = 2.84 
Model | 1.07071245 1 1.07071245 Prob > F = 0.0993 
Residual | 16.2191273 43 .377189007 R-squared = 0.0619 
-------------+------------------------------ Adj R-squared = 0.0401 
Total | 17.2898397 44 .392950903 Root MSE = ,61416 
clscrap | Coef Std. Err. t P>|t | [95% Conf. Interval 

fee a es et init tee ie a pa +--------------------------------------------------------------- 
chrsemp | -.0076007 .0045112 -1.68 0.099 - .0166984 .0014971 
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_cons | -.1035161 . 103736 -1.00 0.324 - .3127197 . 1056875 


e. Any effect pretty much disappears using two years of differences (even though you can 


verify the rank condition easily holds): 
ivreg clscrap d89 (chrsemp = cgrant) 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 91 
Ja aaee s hae eA Segoe A ER F( 2, 88) = 0.90 
Model | .538688387 2 „269344194 Prob > F = 0.4087 
Residual | 33.2077492 88 .377360787 R-squared = 0.0160 
-------------+------------------------------ Adj R-squared = -0.0064 
Total | 33.7464376 90 .374960418 Root MSE = .6143 
clscrap | Coef Std. Err t P>|t | [95% Conf. Interval 
fe al lh a leit ee pa a pa ae +--------------------------------------------------------------- 
chrsemp | -.0028567 .0030577 -0.93 0.353 - .0089332 .0032198 
d89 | -.1387379 .1296916 -1.07 0.288 - .3964728 .1189969 
cons | -.1548094 .0973592 -1.59 0.115 - .3482902 .0386715 


Instrumented: chrsemp 
Instruments: d89 cgrant 


11.16. a. Just use fixed effects, or first differencing. Of course w; gets eliminated by either 
transformation. 


b. Take the expectation of the structural equation conditional on (wi, X;, ri) : 


E(Qvinlwi, Xiri) = YWi + Xup + E(ci{wi, xi, 71) + ECuilwi, Xiri) 


= YWi Xub 00 t Ôiri t X60. 
c. Provided a standard rank condition holds for the explanatory variables, y is identified 
because it appears in a conditional expectation containing observable variables: E(yi:\wi, Xiri). 


The pooled OLS estimation 
Yit ON 1, Wi, Xit, Fi, Xi, t= terror S i= 1,...,N 


consistently estimates all parameters. 


d. Following the hint, we can write 
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Vit = Ôo 4 yw: 4 Xuß + Ojr;+Xj00+a;4 Uin t = I BENET A (11.104) 


where a; = c; — E(cilwi, x;,7;). Under the assumptions given, the composite error, 

Vit = Ai + Ui, is easily shown to have variance-covariance matrix that has the random effect 
form. In particular, Var(vj|w;,X;,r;) = 03 + 07 and Cov(vj, vilwi X; ri) = 03. [The 
arguments for obtaining these expressions should be familiar. For example, since a; is a 
function of c;,x;, and 7;, we can replace c; with a; in all of the assumptions concerning the first 


and second moments of <u; : t = 1,..., 7}. Therefore, 
E(ajuitlwi, Xi, 4i,71) = Ai;E(ui|Wi, Xi, diri) = 0 
and so, by iterated expectations, 
Cov(ai;, uiwi, Xiri) = E(ajuilwi,xi,ri) = 0.] 
We conclude that Var(v;|w;,x;,7;) = Var(v;) has the random effects form, and so we should 
just apply the usual random effects estimator to (11.104). This is asymptotically more efficient 
than the pooled OLS estimator. 
11.17. To obtain (11.81), we used (11.80) and the representation 
JN (Br - B) = A7! Ca 2 > Xiu; ) + 0,(1). Simple algebra and standard properties of 
O,(1).and 0,(1).give 
N N 
MN â- a) = N+? XOZ) Zy,- xp) - a] - G SU(ZiZ)7Z:X; | JN rr- B) 
i=1 i=1 


N N 
= N-12 YG = a) = CA N-12 > Šu, + op(1) 


i=1 i=1 
where C =E[(Z;Z;)'Z;X;] and s; = (Z}Z;)'Zi(y, — xiB). By definition, E(s;) = a. By 


combining terms in the sum we have 
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N 
JN (@- a) = N” Y [Cs - a) —CA*X)u,] + 0, (0), 


i=l 
which implies by the central limit theorem and the asymptotic equivalence lemma that 
JN (@— a) is asymptotically normal with zero mean and variance E(r;r}), where 
r; = (s;-—@) - CA?X)u;. If we replace a, C, A, and B with their consistent estimators, we get 
exactly (11.81) because the ü; are the T x 1 FE residuals. 


11.18. a. Using equation (8.47) we have 


o [Ee Goes) Geos] 
(oxi OZ; (Èz ZÂ z) (zas) 


where @ has the RE form (and is probably estimated from the pooled 2SLS residuals). 


By arguments very similar to that for FGLS, we can show 


N 
YN Brew B) = ACD Ge > ziau ) +a) 


i=1 


where 
A = plim(Q) 
C = E(Z‘A'X,) 
D = E(Z;A Zi) 
A=C'D'C 


Note that this formulation recognizes that Q is not generally consistent for E(v;v;). It follows 


that 


VN Barry = B) 4 Normal(0, A *BA**) 
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where 
B=CD'E(ZAtuuA'Z;)D'C 


b. Consistent estimators of A and B are 


where 


û: = y; - XB rer 
11.19. a. Below is the Stata output. The concen variable is positive and statistically significant 
using both RE and FE estimation of the reduced form, and using fully robust (that is, to any 
serial correlation and heteroskedasticity) standard errors. The coefficient is somewhat larger 
for RE compared with FE, and its standard error is somewhat smaller for RE. We conclude that 


concen is suitably partially correlated with /fare in order to apply REIV and FEIV. 


. xtreg lfare concen ldist ldistsq y98 y99 yOO, re cluster(id) 


Random-effects GLS regression Number of obs z 4596 
Group variable: id Number of groups = 1149 
R-sq: within = 0.1348 Obs per group: min = 
between = 0.4176 avg = 4. 
overall = 0.4030 max = 
Random effects u_i ~Gaussian Wald chi2(7) = 386792.48 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 


(Std. Err. adjusted for 1149 clusters in id 


| Robust 
lfare | Coef. Std. Err. Z P>|z | [95% Conf. Interval 
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concen | 
ldist | 
ldistsq | 
y98 | 
| 

| 

| 


sigma_u | 
sigma_e | 
rho | 


. 2089935 
- .8520921 
.0974604 
.0224743 
. 0366898 
.098212 
6.222005 


.0422459 
2720902 
.0201417 
.0041461 
0051318 
.0055241 
. 9144067 


126193 
-1.385379 
0579833 
.014348 
0266317 
.0873849 
4.429801 


2917939 
- .3188051 
. 1369375 
.0306005 
.046748 
. 109039 
8.014209 


. 31933841 
. 10651186 
. 89988885 


xtreg lfare concen y98 y99 yOO, fe cluster(id) 


Fixed-effects (within) regression 
Group variable: id 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


sigma_u | 
sigma_e | 
rho | 


0.1352 
0.0576 
0.0083 


-0.2033 


Number of obs 
Number of groups 


Obs per group: 


F(4,1148) 
Prob > F 


min 
avg 
max 


120.06 
0.0000 


(Std. Err. adjusted for 1149 clusters in id 


. 168859 
.0228328 
. 0363819 
-0977717 
4.953331 


.0494587 

.004163 
.0051275 
.0055054 
.0296765 


.0718194 
.0146649 
.0263215 
. 0869698 
4.895104 


. 2658985 
.0310007 
. 0464422 
. 1085735 
5.011557 


. 43389176 
. 10651186 
. 94316439 


(fraction of variance 


b. The REIV estimates without /dist and /distsq from Stata are given below. For 


comparison, the REIV estimates in Example 11.1 are also reported. Dropping the distance 


variables changes the estimated elasticity to —. 654, which is notably larger in magnitude than 


—.508. This is a good example of how relevant time-constant variables — when they are 


available — should be controlled for in an RE analysis. 


xtivreg lpassen y98 y99 y00 (lfare=concen), 


G2SLS random-effects IV regression 
Group variable: id 
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re 


Number of obs 
Number of groups 


4596 
1149 


R-sq: within = 0.4327 Obs per group: min = 


between = 0.0487 avg = 4. 
overall = 0.0578 max = 
Wald chi2(4) = 219.33 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 
lpassen | Coef Std. Err Z P>|z | [95% Conf. Interval 
Ft a. a in ee pk i,t, Sey +--------------------------------------------------------------- 
lfare | -.6540984 . 4019123 -1.63 0.104 -1.441832 . 1336351 
y98 | 0342955 .011701 2.93 0.003 011362 057229 
y99 | 0847852 0154938 5.47 0.000 0544178 1151525 
yoo | 146605 0390819 3.75 0.000 070006 2232041 
cons | 9.28363 2.032528 4.57 0.000 5.299949 13.26731 
Gi i L a +--------------------------------------------------------------- 
sigma_u | .91384976 
sigma_e | .16964171 
rho | . 9666879 (fraction of variance due to u_i) 
Instrumented: lfare 
Instruments: y98 y99 yOO concen 


xtivreg lpassen ldist ldistsq y98 y99 y00 (lfare=concen), re 


G2SLS random-effects IV regression Number of obs = 4596 
Group variable: id Number of groups = 1149 
R-sq: within = 0.4075 Obs per group: min = 
between = 0.0542 avg = 4. 
overall = 0.0641 max = 
wald chi2(6) = 231.10 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 
lpassen | Coef Std. Err Z P>|z | [95% Conf. Interval 
a ei a lS ee, Se +--------------------------------------------------------------- 
lfare | -.5078762 . 229698 -2.21 0.027 -.958076 -.0576763 
ldist | -1.504806 .6933147 -2.17 0.030 -2.863678 -.1459338 
ldistsq | .1176013 .0546255 2.15 0.031 .0105373 . 2246652 
y98 | -0307363 .0086054 3.57 0.000 .0138699 .0476027 
y99 | .0796548 .01038 7.67 0.000 .0593104 .0999992 
yoo | .1325795 0229831 5.77 0.000 .0875335 .17 76255 
cons | 13.29643 2.626949 5.06 0.000 8.147709 18.44516 
pd i eC i i +--------------------------------------------------------------- 
sigma_u | .94920686 
sigma_e | .16964171 
rho | .96904799 (fraction of variance due to u_i) 
Instrumented: lfare 
Instruments: ldist ldistsq y98 y99 y00 concen 


c. Now we have three endogenous variables: /fare, (/dist — u1) + Ifare, and 
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(dist? — u2) « Ifare. We can use 
concen, (Idist — 11) « concen, and (Idist? — u2) + concen 


as instruments. In other words, we add the interactions (/dist — u1) * concen and 
(/dist* — u2) + concen as extra IVs to account for the endogenous interactions in the structural 
model. 

In practice, we replace u, and u2 with the sample averages. 

d. The Stata output below provides the estimates. Something interesting happens here. The 
REIV and FEIV estimates of the coefficient on /fare now much closer to each other, and much 
larger in magnitude then the estimates in Table 11.1. In particular, the estimated elasticity at 
the mean of /dist and /distsq is about —1 for REIV and FEIV. Interestingly, the REIV and FEIV 


estimates with the interactions are close to the RE and FE estimates without the interactions. 
. egen mu_ldist = mean(ldist) 
. gen dmldist = ldist-mu_ldist 
. egen mu_ldistsq = mean(ldistsq) 
. gen dmldistsq = ldistsq-mu_ldistsq 
. gen ldist_lfare = dmldist*lfare 
. gen ldistsgq_lfare = dmldistsq*lfare 
gen ldist_concen = dmldist*concen 
. gen ldistsq_concen = dmldistsq*concen 


. xtivreg lpassen ldist ldistsq y98 y99 yoo (lfare ldist_lfare ldistsq_lfare 
= concen ldist_concen ldistsq_concen), re 


G2SLS random-effects IV regression Number of obs = 4596 
Group variable: id Number of groups = 1149 
R-sq: within = 0.1319 Obs per group: min = 
between = 0.0006 avg = 4. 
overall = 0.0016 max = 
wald chi2(8) = 180.72 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 
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lfare | 
ldist_lfare | 
ldistsq_lf~e | 
ldist | 
ldistsq | 
| 

| 

| 

| 


-1.048873 
29.63707 
-2.330287 
-157.8477 
12.45005 
.0319578 
. 080002 
.1570325 
504.8691 


. 3250545 
7.957828 

. 638173 
42. 716284 
3.437782 
.0105546 
.0127579 

.026578 
131.4462 


-1.685969 
14.04001 
-3.581083 
-241.6613 
5.712121 
.0112713 
.0549969 
. 1049406 
247 .2392 


- .4117783 
45.23413 
-1.079491 
-74.03409 
19.18798 
. 0526444 
. 1050071 
. 2091244 
762.499 


1.3686882 
. 19436268 
. 98023276 


lfare ldist_lfare ldistsq_lfare 
ldist ldistsq y98 y99 y0O concen ldist_concen ldistsq_concen 


Instrumented: 
Instruments: 


xtivreg lpassen y98 y99 y00 (lfare ldist_lfare ldistsq_lfare 
= concen ldist_concen ldistsq_concen), fe 


Fixed-effects (within) IV regression 
Group variable: id 


Number of obs 
Number of groups 


Obs per group: min = 


Wald chi2(6) 
Prob > chi2 


avg = 
max = 


4.40e+06 
0.0000 


. 3214187 
6.951145 
. 5593222 
.0102786 
.0123315 
.0260008 
1.694321 


-1.641832 
10.4918 
-3.001273 
.0120689 
. 0566026 
. 1045244 
8.015032 


- . 3818937 
37.73979 
- .8087699 
.0523603 
. 1049414 
. 2064456 
14.65665 


(fraction of variance 


lfare ldist_lfare ldistsq_lfare 


R-sq: within = ; 
between = 0.0016 
overall = 0.0016 

corr(u_i, Xb) -0.9913 

lpassen | Coef 

sam, aly > a “eh a i + 
lfare | -1.011863 

ldist_lfare | 24.11579 
ldistsq_lf~e | -1.905021 
y98 | . 0322146 
y99 | . 080772 
yoo | .155485 
cons | 11.33584 

Sn aka a pl le gy E + 
sigma_u | 6.6845875 
sigma_e | .19436268 
rho | .99915529 

F test that all u_i=0: 

Instrumented: 

Instruments: 


ldist ldistsq y98 y99 y00 concen ldist_concen ldistsq_concen 


e. We can use the command xtivreg2, a user-written program for Stata. The 95% 
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confidence interval for a; is [-2. 408, .385], which includes zero. The fully robust joint test of 


the two interaction terms gives p-value = .101, so we might be justified in dropping them. The 


robust standard error 


xtivreg2 lpassen y98 y99 y00 (lfare ldist_lfare ldistsq_lfare 
= concen ldist_concen ldistsq_concen), fe cluster(id) 


FIXED EFFECTS ESTIMATION 


Estimates efficient for homoskedasticity only 
Statistics robust to heteroskedasticity and clustering on id 


Number of cluste 


Total (centered) 


rs (id) = 


SS = 


Total (uncentered) SS = 


Residual SS 


1149 


128.0991685 
128.0991685 
129.9901441 


Obs per group: min 


avg 
max 


Number of obs 


6, 1148) 


Prob > F 
Centered R2 
Uncentered R2 
Root MSE 


lfare | 
ldist_lfare | 
ldistsq_lf~e | 
y98 | 
| 

| 


-1.011863 
24.11579 
-1.905021 


-2.408321 
2.03166 
-3.657614 
- .0006613 
0296053 
.0328515 


lfare ldist_lfare ldistsq_lfare 


Instrumented: 


Included instruments: 
Excluded instruments: concen ldist_concen ldistsq_concen 


y98 


y99 yoo 


test ldist_lfare ldistsq_lfare 


( 1) ldist_lfare = 0 
( 2) Il1distsq_lfare = 0 

chi2( 2) = 

Prob > chi2 = 


4.59 
0.1008 


f. In general, the estimated elasticities can be obtained from 
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a 
lpassen 


= A A . WO A A . 2 — A 
“Tarer &ı + ¥i(ldist — i1) + ¥2(Idist* — 2) 
for any value of /dist. Calculations are given below. For dist = 500 the estimated elasticity is 
about .047 with a very small ¢ statistic. For dist = 1,500, the estimated elasticity is —1. 77 with 


fully robust t = —1.55. So the magnitude of the elasticity increases substantially as the route 


distance increase, but the estimates contain substantial noise. 


sum ldist ldistsq if yoo 


Variable | Obs Mean Std. Dev. Min Max 

Ss oat aa ie ia pt a ie | ee +-------------------------------------------------------- 
ldist | 1149 6.696482 .6595331 4.553877 7.909857 
ldistsq | 1149 45.27747 8.729749 20.73779 62.56583 


di log(500) - 6.696482 
- .4818739 


di (log(500))^2 - 45.27747 
-6. 6561162 


lincom lfare - .4818739*ldist_lfare - 6.6561162*ldistsq_lfare 


( 1) lfare - .4818739*ldist_lfare - 6.656116*ldistsq_lfare = 0 


. di log(1500) - 6.696482 
61673839 


. di (log(1500))^2 - 45.27747 
8.2057224 


lincom lfare +.61673839*ldist_lfare +8.2057224*ldistsq_lfare 


( 1) lfare + .6167384*ldist_lfare + 8.205722*ldistsq_lfare = 0 
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Solutions to Chapter 12 Problems 
12.1. a. Take the conditional expectation of equation (12.4) with respect to x, and use 


E(u|x) = 0: 


E{[y — m(x,8)]*|x} = E(u?|x) + 2[m(x,@,) — m, @)JE(ulx) + E{[m(x,8,) — m(x, 8)]*|x} 
= E(u?|x) + 0 + [m(x,0,) — m(x, 0)]? 
= E(u?|x) + [m(x,0,) — m(x,8)]?. 


The first term does not depend on @ and the second term is clearly minimized at 0 = 0, for any 
x. Therefore, the parameters of a correctly specified conditional mean function minimize the 
squared error conditional on any value of x. 


b. Part a shows that 
EX{[y — m(x, 8, )]7|x} < E{[y — m(x, 6)]?|x}, all 0 € ©, all x € &. 
If we take the expected value of both sides — with respect the the distribution of x, of course — 
an apply iterated expectations, we conclude 
E{[y — m(x,6,)]*} < E4[y — m(x, 8)]*}, all 0 € ©. 
In other words, if we know 9, solves the population minimization problem conditional on any 
x, then it also solves the unconditional population problem. Of course, conditional on a 
particular value of x, 8, would usually not be the unique solution. (For example, in the linear 
case m(x,®@) = x0, any 9 such as that x(@, — 0) = 0 sets m(x,0,) — m(x, 9) to zero.) 
Uniqueness of 0, as a population minimizer is realistic only after we integrate out x to obtain 
E<[y — m(x, 6)]*}. 


12.2. a. Since u = y — EQx), 


Var(y|x) = Var(u|x) = E(u?|x) 
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because E(u|x) = 0. So E(w?|x) = exp(ao + xy,). 
b. If we knew the u; = y; — m(x;,,), then we could do a nonlinear regression of u? on 
exp(a + xy) and just use the asymptotic theory for nonlinear regression. The NLS estimators of 


a and y would then solve 
N 
2 y2 
min LM expla + xiy)]°. 


The problem is that 8, is unknown. When we replace 8, with its NLS estimator, 6 — that is we 


replace u? with ú?, the squared NLS residuals — we are solving the problem 


min Lo- m(x, ĝ)]? — exp(a + xiy)}?. 


This objective function has the form of a two-step M-estimator in Section 12.4. Since 6 is 
generally consistent for @,, the two-step M-estimator is generally consistent for a, and y, 
(under weak regularity and identification conditions). In fact, YN -consistency of â and 7 
holds very generally. 

c. We now estimate 0, by solving 


N 
min ib: — m(x;,0)]*/exp(a@ + x7), 


where @ and ¥ are from part b. The general theory of WNLS under WNLS.1 to WNLS.3 can 
be applied. 

d. Using the definition of v, write u? = exp(a, + xyo)v?. Taking logs gives 
log(u*) = ao + xy, + log(v*). Now, if v is independent of x, so is log(v?). Therefore, 


Eflog(u*)|x] = a. + xy, +E[log(v?)|x] = ao + xy, +Ko, where Ko =E[log(v7)]. So, if we 
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could observe the u;, and OLS regression of log(u?) on 1,x; would be consistent for 

(Qo +Ko,Y,,); in fact, it would be unbiased. By two-step estimation theory, consistency still 
holds if u; is replaced with ú;, by essentially the same argument in part b. So, if m(x,@) is 
linear in 0, we can carry out a weighted NLS procedure without ever doing nonlinear 
estimation. 

e. If we have misspecified the variance function — or, for example, we use the approach in 
part d but v is not independent of x — then we should use a fully robust variance-covariance 
matrix in equation (12.60) with A; = exp(@ + x,7). 

12.3. a. The approximate elasticity is 
Alog[E(y|z)/Olog(z1) = O[81 + 82 log(z1) + 63z2]/Olog(z1) = 42. 

b. This is approximated by 100 - dlog[E(y|z)]/dz2 = 100 - 6s. 

c. Since dE(y|z)/dz2 = exp[61 + 62 log(z1) + 4322 + ĝ4z2] + (83 + 20422), the estimated 
turning point is 23 = 63/(-264). This is a consistent estimator of zš = 03/(—264). 

d. Since Vem(x,0) = exp(x101 + x202)x, the gradient of the mean function evaluated under 
the null is 

Voñ; = exp(xi101)x; = miXi, 
where 6; is the restricted NLS estimator. From regression (12.72), we can compute the usual 
LM statistic as NR? from the regression 7; on m;X;1,Mj;Xi2, i = 1,...,N, where a; = yi — mj. 
For the robust test, we first regress m;xj2 on m;xj; and obtain the 1 x K» residuals, r;. Then we 
compute the statistic as in regression (12.75). 
12.4. a. Write the objective function as (1/2) yoi — m(x:,0)]?/h(x;,ĵ). The objective 


function, for any value of y, is 
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q(wi,9;y) = (1/2)[vi — mx, ®)]°/A(xi,¥). 
Taking the gradient with respect to @ gives 


Vog(wi,9;y) = —Vem(x:, 9) [yi — mX: 8) A(x; Y) 
= —Vem(x;, 8)u;(8)/A(xi, Y). 


Taking the transpose gives us the score with respect to 8 for any 8 and any y. 
b. This follows because, under WNLS.1, u; = u;(0,) has a zero mean given x;: 
E[s:(0.;7)|x:] = —Vem(x:,9, ) E(ui|x:)/A(xi,y) = 0; 
the value of y plays no role. 
c. First, the Jacobian of s;(8,;y) with respect to y is 
VySi(O0;) = Vom(xi,0,) uV yh(xi,¥)/[A(xi,y) 7. Everything but u; is a function only of x;, so 
E[V/si(8o;9) x:] = Vom (xi 80) ECx VA AE Y)]? = 
It follows by the LIE that the unconditional expectation is zero, too. In other words, we have 


shown that the key condition (12.37) holds (and we did not rely on Assumption WNLS.3). 
d. We would just use equation (12.60), which can be written as 
A/N N -1 
Avar(6) = a Vom Vom; ) 2 va Vai ) (x veo ) 
i=l i=1 
where iz; = ai;/h;” and Vem; = Vemi/h;'° are the standardized residuals and gradient, 
respectively. 


e. Under Assumption WNLS.3 (along with WNLS.1), 
Var(yi|xi) = E(u?|x;) = o2A(x:,7,), 


and ¥ is /N -consistent for y,. This ensures that the asymptotic variance of /N (6 — 0%) does 
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not depend on that of yN (7 — y,). Further, 


[Vom(x;,0,) Vom(x:,0.)/h(x:,7,)] 


A, = 
Bo = E[si(o;7,)si(00;7,)'] = E{u?Vom(xi,0,) Vom(xi,0.)/[A(x,7,)]?}- 


E 
E 
By iterated expectations and WNLS.3, 


E{u?Vom(x;,0,)'Voem(xi,0.)/[A(xi,¥,)]°} = E{u?Vem(xi,0,) Vem(xi,0.)/[A(xi,7, 1°} 
= E(E{u7?Vom(xi,8,) Vom(xi,9.)/[h(xi,¥,)]?}1x:) 
= E{E(u?|x;)Vom(xi,0,)'Vom(x:,9.)/[A(xi,Y,)]?} 
= E{ozh(xi,7, )Vom(xi,0,) Vom(xi,8.)/[A(xi,¥,)]?} 
= o2E[Vem(x;,0.)'Vem(xi,0.)/h(x:,¥,)] 


= o2A,. 
Therefore, 
Avar[ /N (6-8,)] = 02A;! 


an a consistent estimator is 


i=1 


N —1 
ô? [a > Vom(xi, 6)'Vom(x;, 6)/h(x:,9) l 


N 
Dividing this expression by N to get Avar(0) delivers (12.59). 
12.5. a. We need the gradient of m(x;, 0 evaluated under the null hypothesis. By the chain 


rule, 


Vpm(x,0) = g[xB + 51(xB)* + 52(xB)*] + [x + 251 (xB)*x + 352(xB)*x], 
Vsm(x,0) = g[xB + 51(xB)* + 52(xB)*] « [(xB)*, (xB) 7] 


The gradients with 6; = 62 = Oare 


Vpm(x, B, 0) E gap) °xX 
Vam(x,B,0) = g(xB) + [(xB)*, (xB)’]. 
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Let B denote the NLS estimator with 6; = 62 = 0 imposed. Then Vai(x;,0) = e(x,B)x, 
and Vsm(x;,0) = saD, (x:ĝĵ) "lk Therefore, the usual LM statistic can be obtained as 
NR? from the regression i; on &;x;,&; ° aÑ) è: - (xiB)?, where 3; = g(x;B). If G(-) is the 
identity function, g(-) = 1, and the auxiliary regression is 

fi; on xi, (xiB)’, (xiB)3, 
which is a versino of RESET. 

b. The VAT version of the test is obtained as follows. As with the LM test, first estimate 
the model under the null and obtain the NLS estimator, B, as before. Then estimate the 


auxiliary model with (x:B) and (x iB)” as explanatory variables. In other words, act as if the 
mean function is 
GiB +3, (xiB)” + 52(xB)"] 

and estimate 6; and 62 along with B. A joint Wald test, made robust to heteroskedasticity if 
necessary, Of Ho : 61 = 0,62 = 0 is asymptotically equivalent (has the same asymptotic size 
and asymptotic power against local alternatives) to the LM test. Given the way modern 
software works, this often affords some computational simplification (albeit modest). When 
G(-) is the identify function, this variable addition approach gives the RESET test in its 
traditional form. 

One danger in using the VAT is that it is tempting to use the second-step estimates of 61, 
61, and even ß as generally valid estimators. But they are not. If the null is false, B is 
inconsistent for B (because f is imposed with 6; = 0, 62 = 0) and so the added variables are 
not correct under the alternative. The VAT should be used only for testing purposes (just like 


the LM statistic). 
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12.6. a. The pooled NLS estimator of 0, solves 


ou x bi m(Xi, 0)]?/ 


i=l t=1 
and so, to put this into the standard M-estimation framework, we can take the objective 
function for a random draw i to be q;(0) = g(wi,9) = Da — m(x;z,9)]?/2. The score for 
random draw 7 is s;(@) = Vegi(8) = -5 Vom(xir,0)'wi:(8). Without further assumptions, a 


consistent estimator of B, is 


N 
Ê = y£ Y s:(6)s,6) 


i=1 
where 6 is the pooled NLS estimator. The Hessian for observation i, which can be computed as 


the Jacobian of the score, can be written as 
T T 
H;(0) = Ves;(0) = -5 Vam(x iz, OJui(0) + >! Vom(Xir 0)'Vom(xiz, 9). 
= t=1 
When we plug in 0, and use the fact that E(wj|xi;) = 0, allt = 1,...,7, then 
T T 
Ao = E[H;(8.)] =-)> EL [Vam(xir, 0, )uir] + > E[Vom(xit, 8.,)'Vem(xir, 8.)] 
t=1 t=1 
T 
= J E[Vom (xn, 00)’ Vom(Xir, O0)] 
t=1 


because E[V§m(xiz,0,)ui] = 0,t = 1,..., T by iterated expectations. By the usual law of large 


numbers argument, 


N T N 
Â =N! S >! Vom an Ô) Vomar, 8) =NŅ! SIA 
i=1 t=1 j- 


is a consistent estimator of A,. Then, we just use the usual sandwich formula in equation 
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(12.49). 

b. As in the hint we show that B, = o3Ao. First, write s;(@) = = si:(0), where 
Ss#(0) = —Vom(xir, 9)ui:(8). Under dynamic completeness of the mean, these scores are serially 
uncorrelated across ¢ (when evaluated at 0%, of course). The argument is very similar to the 
linear regression case from Chapter 7. 


Let r < t for concreteness. Then 
E[si(8,,)Sir(8,,)'|Xir, Xir, tir] = E(uit|X it, Xir, Wir) UirV oM(Xit, 0o) Vom(Xir, ĝo) = 0 
because E(wis|X ir, Uie-1,,Xiz-1,..-,) = Oandr < t. Now apply the LIE to conclude 
E[si(8,)si-(8,)'] = 0. So we have shown that B, = E Elsir(O,)sir(0,) I. But for each ¢, 
apply iterated expectations: 


E[si(@, )sir(@,) ] = E(u3V om(Xxir,90)'Vom(Xi, 90) ] 
= E[E(u2|xi)Vom(Xit,00)/Vom(xit, 90) | 
= 02E[Vom(xiz,9.)/Vom(xir, 90)] 


where the last equality follows because E(u2|x;,) = 03. It follows that 


Bo = 


T T 
t=1 


E[si(,)8ix(0,) ] = 02 ¥\E[Vom(xir,0.)'Vom(xir,00)] = 02A. 


t=1 


Next, the usual two-step estimation argument — see Lemma 12.1 — shows that 


N T T 
(NT) 7 Sa & TE) = o2 as N > o. 
=1 t1 ml 


The degrees of freedom correction — putting NT — P in place of NT — does not affect 
consistency. The variance matrix obtained by ignoring the time dimension and assuming 


homoskedasticity is simply 
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-1 
Yom) Vuna ) ; 


and we just showed that N times this matrix is a consistent estimator of Avar /N (6 — 8%). 
c. As we just saw in part b, Bo = o2A,, which means by slightly extending the argument 


before (12.69) we can use an extension of the LM statistic there. Namely, 


It is convenient to choose 


N 
M = 2 > Vom(Xit, 6) 'Vem(Xit, 6), 


~I 


where 6 = (B 5). So the LM statistic can be written as 


N T N T Irn fT 
ome (= >D UitVom(Xit, D) (= > Vom(Xit, 8)'Vem(xir, D) (= > Vom(Xit, 6) a) 
i=1 11 i=1 1-1 i=1 1 
where we take 
N T 
g aN i 
i=l 11 


(It is common not to use the degrees of freedom adjustment when estimating oĉ under the 


null.). Finally, the LM statistic can be written as 


(Eia Di eV oi) ) QO Za Vor irs B)'Vorm (xi) ) CON XL, Vonn ő) au) 
e is 


= NTR? 
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because the numerator is the explained sum of squares from the pooled OLS regression 
Üu on Vem(Xiz,9), t = 1,... T; i= 1,...,N 
and the numerator is the (uncentered) total sum of squares. 

12.7. a. For each i and g, define wig = Vig — M(Xig, Qog), so that E(uig|xi) = 0, g = 1,...,G. 
Further, let u; be the G x 1 vector containing the wig. Then E(u,uj|x;) = E(u;u;) = Qo. Let ù; 
be the vector of nonlinear least squares residuals for each observation i. That is, compute the 
NLS estimates for each equation g and collect the residuals. Then, by standard arguments 


(apply Lemma 12.1), a consistent estimator of Q, is 


because each NLS estimator, 6, is consistent for Oog as N — ©. 
b. This part involves several steps, and I will sketch how each one goes. First, let y be the 
vector of distinct elements of Q — the nuisance parameters in the context of two-step 


M-estimation. Then, the score for observation 7 is 


s(w,,8;7) = —Vem(x;,6)'Q‘u;(8) 
= -[u;(8) ® Vem(x;,8)]'vec(Q*) 


where m(x;,@ is the G x 1 vector of conditional mean functions. With this expression we can 
verify condition (12.37), even though the actual derivatives are complicated. It is clear that 
Vys(w, 0; y) is a linear combination of u;(®), where the linear combination is a function of x; 
(and the parameter values). Therefore, because E(u;|x;) = 0, E[Vys(w,,9,;¥)|x:] = 0 for any y, 
that is, any Q. Its unconditional expectation is zero, too, which verifies (12.37). This shows 


that we do not have to adjust for the first-stage estimation of Q,. (Note: This problem assumes 
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that Var(u;|x;) = Qo, but it is clear that (12.37) holds without any assumption about 
Var(u,;|x;). We just need the estimator we use, Ô, to converge to its limit at the usual /N rate.) 


Next we obtain B =E[s;(0,;7,)si(®,;7,) I: 


E[s:(0,;7,)8i(8,;7,) ] = E[Vem;(8.)'Qz'ujujQ,'Vom;(8.)] 
= E{E[Vm,(0,)'Q;'uuQ5'Vem;(0,)|xi] 
= E[Vom,(0,)'Q,'E(uju;|x;)Q5'Vem;(8.)] 
= E[Vem;(8,)'Q,'Q.Q,'Vomi(8.)] 
= E[Vem;(8.)'Q;'Vom(8.)]. 


Next, we have to derive A, =E[H;(9.;7,)], and show that Bo = Ao. The Hessian itself is 
complicated, but its expected value is not. The Jacobian of s;(0;y) with respect to 8 can be 
written 

H:(0;y) = Vom(x;,6)'Q™'Vom(x;,0) + [Ip @ u:(0)' JF; 0;Y), 
where F(x;,0;7) is a GP x P matrix, where P is the total number of parameters, that involves 
Jacobians of the rows Q'Voem;(8) with respect to 0. The key is that F(x;,0;y) depends on x;, 


not on y;. So, 


F[H;(8,;7,) |x] = Vom; (00) 'Q7'Vom;(00) + [Ip ® E(uj|x;)'JF(x:, 90;7,) 
= Vom;(6,)'Q;'Vom;(0,). 


Now iterated expectations gives Ay =E[Vem;(0,)'Q;'Vem;(0.)]. We have verified (12.37) 
and also that A, = B,. Therefore, from Theorem 12.3, 
Avar /N Ô- 00) = A! = {E[Vom,(8.)'Q;'Vom,(8.)]} 7. 

c. As usual, we replace expectations with sample averages and unknown parameters, and 


—™~ LA 
divide the result by N to get Avar(@): 
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N —1 
Avar(6) = (2X vob) vom) ) IN 


i=1 


N =i 
(Sv. 6) vom. ) l 
i=1 
The estimate Q can be based on the multivariate NLS residuals or can be updated after the 
nonlinear SUR estimates have been obtained. 

d. First, note that Vam;(0,) is a block-diagonal matrix that has G rows, with blocks 
Vo,Mig(9,..), a 1 x Pg vector. (I assume that there are no cross-equation restrictions imposed in 


the nonlinear SUR estimation.) If Q, is diagonal, so is its inverse. Standard matrix 


multiplication shows that 


o1 Vo m4 Voa mA 0 eee 0 
0 O55 V9,m%Vo,me 
Vem,(8,)'Q;'Vem,(8,) = ae 
0 T 0 o52Vo,m%Veom%s 


Taking expectations and inverting the result shows that 

Avar/N (6, — bog) = O2g[E(Vo,m%Vo,m?,)] 1, g = 1,...,G. (Note also that the nonlinear 
SUR estimators are asymptotically uncorrelated across equations.) These asymptotic variances 
are easily seen to be the same as those for nonlinear least squares on each equation. 

e. I cannot see a nonlinear analog of Theorem 7.7. The first hint given in Problem 7.5 does 
not extend readily to nonlinear models, even when the same regressors appear in each 
equation. The key is that X; is replaced with Vem(x:,8,). While this G x P matrix has a 
block-diagonal form, as described in part d, the blocks are not the same even when the same 


regressors appear in each equation. In the linear case, Vo ,mg(X;, Qog) = X; for all g. But, unless 
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Oog is the same in all equations — a very restrictive assumption — Vo,77¢(Xi, Qog) varies across g. 
For example, if mg(x;, Oog) = exp(x8o2) then Vo,mg(Xi, Oog) = exp(xO.¢)xi, and the gradients 
differ across g. 


12.8. As stated in the hint, we can use (12.37) and a modfied version of (12.76), 
N N 
N! X 8:(6;9) = NV? $ 8:(0,;9) + AoVN 6 - 0.) + 0p(1), 
i=1 i=1 


to show JN (6 - 6) = A;N”? ae s;(8;7) + 0,(1); this is just standard algebra. Under 


(12.37), 


N N 
N-12 >/s:(6;4) = N-12 >'s:(6;7,) = 0,(1), 


i=1 i=1 


by a similar mean value expansion used for the unconstrained two-step M-estimator: 


N N 
N! X sið; P = N $ s:6;7,) + ELV y8:(0,;7,)] VNG- Y,) + 001), 
i=1 


i=l j= 
and use E[Vys:(0,;y,)] = 0. Now, the second-order Taylor expansion gives 
N N N N 
Saw: -J awh = D7 sÂ + (1/2) 6 - 6)' iS i) - 6) 
i=l i=1 i=1 i=1 
N 
= (1/2)(6 — 6)’ (= i) — 6). 
i=l 


Therefore, 
N . N . _. a 
(Zath -Zam ) = [VN 6- ÔJ'A [VN 6 - 6)] +0,01) 


i=1 j= 
N N 
= Ore >» s Jas (s > s) + op(1), 
i=1 i=1 
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where §; = s;(6; Y,)- Again, this shows the asymptotic equivalence of the QLR and LM 
statistics. To complete the problem, we should verify that the LM statistic is not affected by 7 
either, but that follows from N"? aa s(0;7) = N-¥2 ya s (0; Y,) + Op(1). 

12.9. a. We cannot say anything in general about Med(y|x) because 

Med(y|x) = m(x,B,,) + Med(u|x) 
and Med(u|x) could be a general function of x. 

b. If u and x are independent, then E(u|x) and Med(u|x) are both constants, say a and ô. 
Then E()|x) — Med(y|x) = [m(x,B,,) + a] — [m(x,B,,) + 6] = a — 6, which does not depend on 
x. 

c. When u and x are independent, the partial effects of x; on the conditional mean and 
conditional median are the same, and there is no ambiguity about what is “the effect of x; on 
y,” at least when only the mean and median are under consideration. In this case, we could 
interpret large differences between LAD and NLS as perhaps indicating an outlier problem. 
But it could be just that u and x are not independent, and so the function m(x,B,) cannot be 
both the mean and the median (or differ from each of these by a constant). 

12.10. The conditional mean function is m(x; ni, B) = nip(xi,B). So we would, as usual, 
minimize the sum of squared residuals, Erbi — nip(x:, B)]? with respect to B. This gives the 
NLS estimator, say B. Define the weights as /; = n pan BL — p(x, B)]. Then the weighted 
NLS estimator minimizes PIAL — nip(xi, B)]2/hj. 

12.11. a. The key to the derivation is to verify condition (12.37), which is similar to 
Problem 12.7. In fact, this contains Problem 12.7 as a special case. In particular, write the 


score (with respect to 0) for observation i as 
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s(w,,0;y) = —Vem(x;, 8)'[W(x:,7)]‘u:(8) 
= —[u,(8) 8 Voem(x,,6)]'vec{[Wi(y)] ">. 


The Jacobina of s(w,,0;) with respect to y is generally complicated, but it is clear that 
Vys(w,, 0; y) is a linear combination of u;(0), where the linear combination is a function of x; 
(and the parameter values). Therefore, because E(u;|x;) = 0, E[Vys(w,,0,;¥)|x:] = 0 for any y, 
which verifies (12.37). Notice that we do not need to assume Var(u;|x;) = W(x;,7,,) for some 
Yo: 


Without assuming (12.96) there are no simplications for 


B, = E[s;(8,;y*)si(®,;7*)'] 
= E{Vom,(0,)'[Wi(y*)] ‘usu;[Wi(y*)] 'Vem;(8.)} 


where y* = plim(7). A consistent estimator of B, is 
N 
Ê = N $ Vom: (Ô) [Wi] i âW: G] 1Vom.6) 
i=1 
where 6 is the WMNLS estimator. 

We also need to consistently estimate A, =E[H;(9.;7,,)]. Again, the argument is similar to 
that in Problem 12.7, and uses that the mean function is correctly specified. We can write the 
Hessian as 

H,(8; 7) = Vom: (0)'[W;(y)] "Vom: (0) + [Ip @ u:(8)'JF(x:, 8), 
where F(x;,9;7) is a GP x P matrix, where P is the total number of parameters, that involves 


Jacobians of the rows [W;(y)]~!Vem;(8) with respect to 6. Therefore, 


E[H,(60;y*)|x;] = Vom:(80)'[Wi(y*)] Vom: (80) + [Ip @ E(uilx:)'TF(x:,00;7*) 
= Vem,(8,)'[Wi(y*)]"'Vom(@.), 
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and so Ao =E[Vem;(0,)'[W:(y*)] Vem;(8,)]. A consistent estimator of Aj is 


N 
A = N! $ Vom: Ô)'[W:@)] Vom; (ô). 


i=1 
When we form 
Avar(6) = ABA WN, 
simple algebra shows this expression is the same as (12.98). 
b. If we assume (12.96) then 


Bo = E(E{Vom,(8.)'[Wi(y,)] ‘uu;[Wi(y,)]-'Vem:(8.)}|x:) 
= E{Vom,(8.)'[W:(y,)] 7E(uiu)|x:)[Wi(y,)] 'Vom:(8.)}|x:) 
= E{Voem,(8.)'[Wi(y,)] *Wily,)[Wily,)] ‘Vem:(0.)} 
= hy 


and so 
Avar[ /N (6—6,)] = A7! 


c. We can apply Problem 12.8 once we have properly chosen the objective function to 


ensure B, = A, when (12.96) holds. That objective function, with nuisance parameters y, is 
q(wi,®;y) = (1/2)[y; - m(x, 0] [W:M] “Ly; - m(x;,9)] 
The division by two ensures 
E[s:(0,;Y,)s:(80,;Y,)'] = E[Hi(8.;7,)] 


Now, letting 6 and 6 be the restricted and unrestricted estimators, respectively — where both 


use ĵ as the nuisance parameter estimator — the QLR statistic is 
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N 
OLR = (Zas ĝ) - Lat ») 


i=1 = 


where W; = W((9), di = y,- m(x,,0), and a; = y, — m(x,,ĝ). Under Ho and standard 
regularity conditions, QLR . Los where Q is the number of restrictions. 
And F-type statistic is obtained as 


sy aw a- 0" aw a) /O 
(Er aw; a) [ING -P 


which can be treated as an approximate Fọ,ng-p random variable. Note that under (12.96) 


E[u;W:;(y,) tu] = E{E[u;Wi(y,)‘uilxi]} 
= E{tr E[Wi(y,) ‘uuj|xi]} = G 


because E(u;u;|x;) = W,(y,). Therefore, 
al 1 
(NG)1 >) aiW; a; 5 1 


and using NG — P is a degrees-of-freedom adjustment. 
12.12. a. We can appeal to equation (12.41) and the discussion that follows about the 


scores for the two problems being uncorrelated. We have 
s;(0;8) = —Vom(x;, v(wi, 8), 9)' Ly; — m(x:, v(wi, 5), 8) ] 
and we know, because E(y;|x;, wi) = m(x; v(w;,5.), 9.0), 
E[s;(0,;5,)|x:, wi] = 0. 


As usual, this means any function of (x;,w;) is uncorrelated with s;(0.;5,), including r(wi, 8o). 
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It follows that 
D, = B, + F.E[r(wi, §,)r(wi, 8.) IF), 
where 


B, = E[s;(0.;5,)s:(0.;8,)'] 
Fo = E[Vss,(8.;5,)]. 


the matrix F,E[r(w;,5.)r(wi,5.)'JF, is at least p.s.d., and so D, — B» is p.s.d. The asymptotic 


variance of the two-step estimator of 0, (standardized by yN ) is 
AJDA; 
and that of the estimator where 6, is known is 
A> B.A. 
b. To estimate A, under correct specification of the mean it is convenient to use 
A, = E[Vom(x;, vV(W;, 5), 9.)'Vem(x;, V(wWi, 50), 90) ] 


and so 


N 
A = N= $ Vomax; v(wi,8),6)'Vom(x:, v(wi,8), 6)] 


i=1 


Further, 
N 
B = N* $ s,(6;8)s;(6;8)'). 
i=1 


It remains to consistently estimate F,. But by the product and chain rules, 


V58;(0;5) = —Vs{Vom(xi, v(wi, 8), 8)'+[y; — m(x;, v(w;, 5), 8)] 
+ Vem(x;, v(w:, 5), 8)'Vym(x;, v(w;, 5), 0) Vsv(w;, 9). 


When we plug in (@,,6,) the first term has zero mean because the conditional mean is 
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correctly specified — much like the argument for the Hessian. Therefore, 


Fo = E[Vss;(0.;6,)] 
= E[Vom(x;, V(Wi, 0), 9)'Vvm(X;, V(Wi, 0), 8.) Vsv(wi, 0) 


and 


N 
F = N" > Vom(x;,V(w;, 5), 0)'Vym(x;, v(w; ô), 0)Vsv(wi;, ô) 


i=1 


is consistent for F,. Finally, let 


N 
C= N+ $ riS)ri(d)’. 


i=1 
Then 
Avar[ YN 6-6,)] =A (B+FCF 


which, numerically, will always be larger (in the matrix sense) than ABA’. 


12.13. a. Strict exogeneity is not needed because the population objective function is 


T 
(1/2) SO Ed Lyi — m(xir, 0)]?/h(xiY*)], 


t=1 
and @, minimizes this function provided 
E(vidlXiz) = MXit, 00), t = 1,...,T. 
We do not need 
E(vidlXin, Xi2,...,Xir) = MXi ĝo). 
The proof of the claim that 8, is a minimizer of the population objective function could use 


the score — assuming that m(x;, +) is continuously differentiable and 0, € int(®) — and follow 


Problem 12.4. But we can show directly that, for each ¢ = 1,..., 7, 
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E{Lyie— m(Xir,80))7/A(xi,¥*)] < E{Lvir — m(xi, 8)]*/h(xi, *)], 8 € © 
and then inequality clearly holds when we sum over ¢. Identification requires that strict 
inequality holds for 0 + 0, when we sum across t. 


To establish the above inequality, we follow Problem 12.1. Applied to a given ¢, we have 
EX{[yie— m(Xit, 00)] Xi < E{ [vie -MXi ®)]?|Xi} 
for any xi. Because A(x, y*) > 0 the inquality continues to hold if we divide each side by 


h(xir,y*). Further, because A(xi;,y*) is a function of x, we can bring it inside both conditional 


xu} 


expectations: 


[vit = m(Xit, 6.)]? 
r nany” 


x} < ef hany) 


and then take the expected value with respect to x; on both sides. 


b. For each ¢ we can use the same argument for WNLS on a single cross section to show 
E[Vysi(80;7)] =0 
for any y, where 
Sa(0; Y) = -Vom (Xir, 0) uu (O)/AX in Y) 


Because 
T 
s:(@;y) = $ si(8;7) 
t=1 


it follows that condition (12.37) holds, so we can ignore the estimation of y* in obtaining 


Avar[ /N (6 — 0%)]. But then we just need to estimate 
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Bo = E[s;(8.;*)si(80;y*)'] 
T 

A= 

=] 


t 


T 
E[H:(00;y*)] = $ E{u2Vem(Xir,00)'Vom(Xxir, 00)/[A(Xis¥)}?¥ 
1 


where uit = Yit — M(Xin ĝo). Consistent estimators are 


i=1 i= 


N T T 
= N7 >| tinVom(Xir,0)'/A(Xit,¥) l X uV omin, ÔA F) l 
Hi 1 t=1 


N f T j 
B= W257 5,6:9)569)] N saps 5.(6:9) l p sô | 
1 l t=1 


and 
N T 
Â =N! > a úV om(xir, 9)'Vem(xir, 8)/[A(xir, 7)’. 
El el 


Notice how B includes terms involving w;t; for t + r, thereby allowing for serial correlation. 
Further, terms involving #2/[h(x;,7)]* mean we are not assuming the variance function is 
correctly specified. 


c. For any y we can write 
8i:(80;7)Sir(O03y)' E UitUirVom(Xir, 00) Vom(Xir, 0,)/[hxin, YACXir, Y)]. 


Taker < t. Then, by dynamic completeness — that is, E(vi|Xi, Vie-1, Xiz-1,---5Vi,Xi) = O- 


E(uis\uir, Xit Xir) = 0,and so 
E[s (00; Y) lutir, Xin Xir] = 0. 
Therefore, 
E[ir(80;Y)Sir(O03Y) ltir, Xin Xir] = 0 


and so E[s;(@,;y)si-(0.;)'] = 0. It follows that we need not estimate the terms 
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E[si(80;)8i-(8.;¥)'], and so a consistent estimator of Bo is 
N T 
N » > HV em(Xir, O) Vom Xi, 9)/[A(xin D], 
i=1 t1 


and we need not change A because the conditional mean for each f is assumed to be correctly 
specified. 

Remember that, because our analysis is for fixed T and N > œ, and we are using the usual 
JN -limiting distribution, there is nothing wrong with using the fully robust form even under 
dynamic completeness. There is a sense that imposing zero correlation in the scores when they 
are uncorrelated leads to better finite-sample inference, but that is difficult to establish in any 
generality. 

d. Again, we can keep A the same. For B we can use either of the estimators in parts b or c. 
But if we want to use both dynamic completeness and a correctly specified conditional 
variance, we can simplify B even further. 


T 
B, = S E{U? Vomi, 00) Vom (Xin O) Y,)]} 


t=1 


T 
= $ E(E{UZV om (Kir, 00) Vomi OAE Y, )] i) 
t=1 


T 
DL E{lo3h(xiny,) Vom wir, 00) Vom Xi 00)/AanY,)]?} 


t=1 


T 
= 03 $ E{Vom(Xi 80)'Vom(xit,00)/[h(Xiny,)]} = 03Ao. 


t=1 


So 


where 
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N B 
6? = (NT- P) $ $ Ghan) 


i=1 t1 
is easily shown to be consistent for o2: by iterated expectations, 
Eluz/h(xny,)] = 03, t= 1,...,T. 
12.14. Write the score evaluated at 8, as 


s:(0,) = -x}4r1[y; —x,0, > 0] = (1 = T)1 [yi —x;0, < O|} 
= -x'{r1[u; > 0] -(1—7)1[u; < of 


where u; = y; — x;0,. Therefore, 


s(0,)s:(0,)' = {c1[u; > 0] - (1-7) 1[u; < 0]}2x;x; 
= (r71[u; > 0] + (1 —7)71[u; < O]}x}x; 
where this expression uses the hint that 1[u; > 0] -+ 1[u; < 0] = 0 and the square of an indicator 
function is just itself. 
Now take the expectation conditional on x;: 
E[s;(8,)si(8,)'|xi] = {c?7E(1[u; > O]|xi) + (1 — T) E[u; < 0]|xi)}x;x: 


[c?(1 —r) + (1 — 7) 27] x}x; 


t(1 — t)x;xi, 


where we use the fact that E(1[u; < 0]|x;) =P(v; < x,0.|x;) = T — see the discussion below 
equation (12.110). Now apply iterated expectations to get (12.115). 
12.15. a. 6 (approximately) solves the first order condition 
N T 
S 9 axta- xð > 0] - (1-7) fy — x8 < 0l} = 0, 


i=1 i= 


so the score function for time period ¢ is 
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Sx(0) = —xi{t1 [pit = Xð > 0] ER (1 = tT) = x;,9 < oJ} 


and the score for random draw i is 
T 
si(0) = $ sin(6). 
i=l 


b. We have to show that the scores {s;(@,) : t = 1,..., 7} in part a are serially 
uncorrelated. Now 
Slo) = —x),<71 [uz > 0] - (1-7) 1[uy < OF} 
and, under dynamic completeness of the quantile, 


E{t1[uize > 0)-(1—17)1 [ux < O]|Xi, vies, ...,ya,Xa} 
= E{rl1 [uz > 0] - (1-7) 1 [ui < O]|xir} 
t1-1t)-(-1t)r =0. 


Therefore, 
E[si(0.) [Xi Vii- Vay Xa] = —X,E[t1 [ua > 0] -(1—7)1 [ua < Ollxi,virr,... Via, Xa> = 0, 


and it follows that if r < t, s;-(8,) is uncorrelated with s;,(@,). Therefore, 


T 
E[sir(O0)8ie(@0)'] = (1 - 1) X E(x/Xir) 


t=1 


Bo = 


T 
1 


and a consistent estimator is 


N T 
Ê = (1-01) Y xixu 


i=1 t=1 


c. This follows in a way similar to the to the cross section case. Now 


T 
a(0) = 2 Els«(0)] 
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and we need its Jacobian. We use E[s;(8)] = EE[s;:(@)|xi:]} for each ¢, and then, just as in 


Section 12.10.2, 
VoE[sir(8)|x ir] = fu(xi(O = 00) |X iz) XX it 


Then 


T T 
Ao = DL EXVoE[si(80) xi) = DY Eu (Oxi) xiXi]- 


t=1 t=1 


12.16. a. Because Med(y2|z) = zm2, we would use LAD. 


b. From 
yı = Z1ðı + Q1y2 + U1 


we have 


Med(y1|y2,z) = Z1ðı + aıy2 + Med(u1|y2, Z) 


= 7181 Q1Vv2 + Piv2 


= 7161 + Q1y2 + pilv2 — ZT2) 
We can use a control function approach but based on LAD. So, in the first stage, estimate 12 


by LAD and compute, for each i, 
V2 = Vi2- Z;%2. 


Then use LAD in the second stage. Using dummy arguments of optimization, 


N 
min inn Zadı a1Vi2 rival 
=1 


di,a1 r « 
i= 


to get 5 1, @1, and fP1. These estimators are generally consistent by the two-step estimation 
result discussed in Section 12.4.1. 


c. It is natural to use the LAD ¢ statistic for p, from the control function procedure in part 
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b. We know from Chapter 6 that if we were using OLS in both stages then we could ignore the 
first-stage estimation of m2 under Ho : pı = 0. That seems very likely the case here, too, but it 
does not follow from the results presented in the text (which assume smooth objective 
functions with nonsingular expected Hessians). 

d. As mentioned in part c, an analytical calculation requires an extended set of tools, such 
as those in Newey and McFadden (1994). A computationally intensive solution is to bootstrap 
the two-step estimation method (being sure to recompute 72 with every bootstrap sample in 
order to account for its sampling distribution). 

12.17. a. We use a mean value expansion, similar to the delta method from Chapter 3 but 
now allowing for the randomness of w;. By a mean value expansion, we can write 


NY gow, 6) = N22 gw, 0, )+ Ory 2G ) rG- 8o), 


i=1 i=1 
where G; is the M x P Jacobian of g(w,, 0) evaluated at mean values between @, and 6. Now, 
because /N (6 -— 0%) © Normal(0,A;'B.A,'), it follows that /N (6 — 0%) = O,(1). Further, by 
Lemma 12.1, Nt pane G; “,E[Vog(w,9,)] = G, (the mean values all converge in probability 


to 0%). Therefore, 
N .. 
G > è) JN (6-8,) = Go JN Â- 00) + 0,(1), 
i=1 
and so 


N N 
N12 Z g(w,, 6) = y2 D g(w,,0,) + G/N 6 —6,) +0,(1). 


i=1 i=1 


Because /N (6 — 0%) = —N-!? ee A;'si(@.) = op(1), we can write 
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N N 
N68 = N2 X gwp Â) = N J igw.) — GoAo'si(80)] + op (1) 
i=1 


j= i=1 


or, subtracting /N 6, from both sides, 


N 
JN (8-8) = N! $ [g(w,0,) — 80 - GoAz'8i(8.)] + 0p (1). 


Now 


E[g(w,,0,) — 80 - GoAo'si(8.)] = Elg(w,,8,)] -8o - GoAz'E[s:(8.)] 
= 6, -ôo = 0. 


Therefore, by the CLT for i.i.d. sequences, 
JN (Ô —8,) ~ Normal (0, Do) 
where 
D, = Var(g, — 8. — G,Aj'S:), 
where hopefully the shorthand is clear. This differs from the usual delta method result because 
the randomness in g, = g;(@,) must be accounted for. 


b. We assume we have A consistent for Ay. By the usual arguments, 


G = N1 ye Veg(w,,6) is consistent for Go. Then 


is consistent for Do, where the “^” denotes evaluation at 6. 
c. Using the shorthand notation, if E(s;|x;) = 0 then g; is uncorrelated with s; because the 
premise of the problem is that g, is a function of x;. Thefore, (g; — 6.) is uncorrelated with 


G,A;5's;, which means 
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D, = Var(g, — 5, — GoAj's,) 
= Var(g, — ôo) + Var(G,Aj's;) 
= Var(g,) + GoAz'BoA,'G, 
= Var(g;) + Go[Avar JN (6 - 8,)]G,, 


which is what we wanted to show. 
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Solutions to Chapter 13 Problems 


13.1. No. We know that 0, solves 
max E[log f(y ,|x:; @)], 
0cO 


where the expectation is over the joint distribution of (x;,y,). Therefore, because exp(+) is an 
increasing function, 8, also maximizes exp{E[log f(y ,|x:; 8)]} over ©. The problem is that the 
expectation and the exponential function cannot be interchanged: 
E[fy |xi;9)] + exp{E[log f(y ,|x:; ®)]}. In fact, Jensen’s inequality tells us that 
E[fly xi; 8)] > exp{E[logfly,|x:; 8) |} 
13.2. a. Because 
JOlx:) = (205) 1? exp[-(v - m(x:,B,))*/(203)], 


it follows that for observation i the log likelihood is 
(Bo?) = -4 log(2r) - 5 log(o?) - S45 [yi - m(x: p)]? 
$ > 2 J 202 l ly . 


Only the last of these terms depends on ß. Further, for any o? > 0, maximizing pa 0:(B, 07) 


with respect to B is the same as minimizing 


N 
Doi - mui, BY’, 
i=1 
which means the MLE 6 is the NLS estimator. 
b. First, 
Vpli(B,o*) = Vem(xi,B)[vi - m(xi,B) ]/0°; 


note that Vgm(x;,B) is 1 x P. Next, 
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00;(B, o? 


For notational simplicity, define the residual function u;(B) = y; — m(x;,B). Then the score is 


Vgmi(B)'wi(B)/o” 


i(O) = 
HOPS, cet AT BN? 


where Vam;i(B) = Vam(xi,B). 

Define the errors as u; = u;(B,,), so that E(w;|x;) = 0 and E(u?|x;) = Var(yilx;) = 03. 
Then, since Vgm;(B,,) is a function of x;, it is easily seen that E[s;(0, )|x;] = 0. Note that we 
only use the fact that E(y;|x;) = m(x;,B,) and Var(y;|x;) = 0% in showing this. In other words, 
only the first two conditional moments of y; need to be correctly specified; nothing else about 
the normal distribution is used. 


The equation used to obtain ô? is 


L (- ae + va [yi - m(x:,B)? ) = 0, 


where B is the nonlinear least squares estimator. Solving gives 
N 
aN r, 
i=1 
where ú; = y; — m(Xi, B). Thus, the MLE of o? is the sum of squared residuals divided by N. In 
practice, N is often replaced with N — P as a degrees-of-freedom adjustment, but this makes no 
difference as N —> œ. 


c. The derivations are a bit tedious but fairly straightforward: 
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—Vpmi(B)'Vpmi(B)/o* + Vami(B)ui(B)/o* ~—Vpmi(B)'ui(B)/o* 


H; 0 = 
ey -Vpm,(B)ui(By/o* ser T go e) 


where Vgm/(B) is the P x P Hessian of m;(B). 
d. From part c and E(w;|x;) = 0, the off-diagonal blocks are zero. Further, 
E[Vpmi(B,)'Vpmi(B,)/o% — Vami(B,)ui/os|xi] = Vemi(B,)'Vemi(B,)/o3 


Because , E(u?|x;) = 02, 


Therefore, 


Vemi(B,)'Vami(B,)/o5 0 
0 1 


204 


— E[H;(9.)|xi] = 


where we again use E(u;|x;) = 0 and E(u?|x;) = 03. 
e. To show that —E[H,(0,)|x;] equals E[s;(@, )s;(0,)'|xi], we need to know that, with u; 
defined as above, E(u?|x;) = 0, which can be used, along with the zero mean and constant 


conditional variance, to show 


Vem i(B,)'Vami(B,)/o% 0 
E[s:(8.)si(80) [xi] = 


Further, E(u}|x;) = 304, and so 


2 4 2 
1 1 2 1 305 26; 1 
E| (SS + = oh = = . 
( 202 Das u?) | 404 408 = 408 204 


Thus, we have shown —E[H,(0,)|x;] =E[s;(@. )s;(@.) [xi]. 
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(13.99) 


f. From general MLE, we know that Avar /N (6 — B,) is the P x P upper left hand block of 
{E[A;(0.,)]}~', where A;(6,) is the matrix in (13.99). Because this matrix is block diagonal, it 


is easily seen that 
Avar/N (B-B,) = o3{E[Vpmi(B,)'Vemi(B,)]}“, 
and this is consistently estimated by 
N -1 
ô? G > Vaiva: ) (13.100) 
i=1 


—™~_LA 
which means that Avar(B) is (13.100) divided by N, or 
N -1 
Avar(ĝ) = (È Vaiva ) (13.101) 
i=1 


If the model is linear, Vgm; = x;, and we obtain exactly the asymptotic variance estimator for 
the OLS estimator under homoskedasticity. 


13.3. a. The conditional log-likelihood for observation i is 
1(8) = yiloglG(x;,8)}] + (1 —yi) log[1 — G(x; ®)]. 
b. The derivation for the probit case in Example 13.1 extends immediately: 


s:(0) = yiVeG(xi,8)'/G(x:,8) — (1 — yi) VeG(x:,8)'/[1 — G(x; 0)] 
= VeG(x;,)'[vi — G(x:,0)]/{G(x;,6)[1 — G(x;,6)]}. 


If we plug in 0, for @ and take the expectation conditional on x; we get E[s;(0,)|x;| = 0 
because E[y; — G(x;,9.)|x;] = 0, and the functions multiplying y; — G(x;,0,) depend only on 
Xj. 

c. We need to evaluate the score and the expected Hessian with respect to the full set of 


parameters, but then evaluate these at the restricted estimates. Now, 
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VoG(xi,B,0) = o(xB)[x, (xB)*, (xB) “I, 
a 1 x (K+ 2) vector. Let B denote the probit estimates of B, obtained under the null. The score 


for observation i, evaluated under the null estimates, is the (K + 2) x 1 vector 


s:(6) = VoG(x:,B, 0)'Lvi — ®(x:B))/{@(x.B)[1 — oak) 
= pak živ: — DARIO - Pap), 


where Žž; = [x;, (xiB)2, (x:B)°]. The negative of the expected Hessian, evaluated under the null, 
is the (K + 2) x (K + 2) matrix 

A(x;,6) = [6(x,)]?2;2:/{(x,B)[1 - O(x,B)]}- 
These can be plugged into the second expression in equation (13.36) to obtain a nonnegative, 


well-behaved LM statistic. Simple algebra shows that the statistic can be computed as N times 


the explained sum of squares from the regression 


tii = bi * Xi di + (%iB)? Qi (xi) a ee 
Ja- ®;) Jo1-6) /60-6) Ja- 
where “~” denotes evaluation at (8,0) and ã; = y; — ®;. Under Ho, LM is distributed 
asymptotically as 73. 
d. The variable addition version of the test is to estimate, in a second step, a probit model 


with response probability of the form 

O[xiB + 51(xiB)? + 52(x,B)°] 
and the compute a Wald test of Ho : 61 = 62 = 0. As we discussed in Problem 12.5 ina 
related context, this is to be used only as a test. The estimates of 6; and 62 obtained by 
inserting B into the square and quadratic are generally inconsistent if at least one of 6; and 62 


is different from zero. 
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13.4. If the density of y given x is correctly specified then E[s(w, 0, )|x] = 0. But then 
E[a(x, 8,.)s(w, 0.)|x] = a(x, 0. )E[s(w, 0,)|x] = 0 
which, of course implies an unconditional expectation of zero. The only restriction on a(x, 0%) 
would be to ensure the expected value is well defined (but this is usually just assumed, not 
verified). 


13.5. a. Because s/(o,) = [G(®,)']~'s:(@.), 


E[si(,)si (do) xi] = E{[G@.)'Js:(80)si(80) [G(8.)]"'1x;} 
= [G@,)] TE[s:(0.)s:(80) Ix; J[G(02)] 7 
= [G@.)]'A(8.)[G@.)]}". 


where the last equality follows from the conditional information matrix equality. 


b. In part a, we just replace 0, with ð and ġo with @ ; 
At = [GÕ a0] = GAG". 
c. The expected Hessian form of the statistic is given in the second part of equation (13.36), 


but where it depends on §? and A; : 


13.6. a. No, for two reasons. First, just specifying a distribution of y; given xj, says 
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nothing, in general, about the distribution of yx given x; = (Xz, ..., Xir). We could assume 
these two are the same, which is the strict exogeneity assumption. But, even under strict 
exogeneity, we would have to specify something about joint distributions (perhaps via 
conditional distributions) involving different time periods. We could assume independence 
(conditional on x;) or make a dynamic completeness assumption. Either way, without 
substantially more assumptions, we cannot derive the distribution of y, given x;. 

b. This is given in a more general case in equation (18.69) in Chapter 18. It can be derived 


easily from Example 13.2, which gives 0;(0) for the cross section case: 
T T 
00) = $ [vix0 — exp(x:0)] = $ (0). 
t=1 t=1 


Taking the gradient and transposing gives 


Il 


T 
s:(0) = >) xia — exp(xi8)] 


t=1 


T 
>/s:(8). 


c. First, we need the Hessian for each i, which is easily obtained as Ves;(0) : 
T 
H;(0) = — $ exp(xi0)x/xi, 
1 
which, in this example, does not depend on the y;; (see Problem 13.12 for the notion of a 


canonical link function). In particular, A;(@,) = —E[Hi:(0.)|xix] = —Hi:(8.). Therefore, 


N T 

A = A 1 

A =N! > exp(x ;8)x Xi, 
=1 1 


where 6 is the partial MLE. Further, 
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N 
B= N+ 9 's,(6)s,(6)', 


i=1 
and then Avar(6) is estimated as 
N T 1/7 N N T -1 
= > ewean (x +Osi6 ) > D eoan 
i=1 t1 i=1 i=l t1 
d. If EQvielXi, Viet, Xi -- Vit X1) = E(vir|Xiz) then 
E[si(Oo)|Xit, Vite-1) Xit-15 vee ] = Xi LE(itlXit, Vitets Xi t-11,- ) = exp(x 8.) | = 0. 


As usual, this finding implies that s;,(6,) and s;(@,) are uncorrelated, £ + r. Therefore, 
T T 
Bo = SEL si. )si(00) | = DI Elux yX), 
t=1 t=1 


where ti = Vie — EYulXi) = Yi — eXP(Kir8.). Now, by the Poisson assumption, 


E(u?lxr) = Var(vilXir) = exp(xx90). By iterated expectations, 


B, = 


T 
E[exp(xx00)X; Xu] = Ao. 
=1 


t 
(We have really just verified the conditional information matrix equality for each ¢ in the 
special case of Poisson regression with an exponential mean function.) herefore, we can 


estimate Avar(6) as 


=1 tl 


N T -1 
(E Eerad) > 


which is exactly what we get by using pooled Poisson estimation and ignoring the time 
dimension. 


13.7. a. The joint density is simply g(yıly2, x; 9.) * (v2|x; 00). The log-likelihood for 
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observation 7 is 


0:(0) = log(gQulvi2,xi;90)] + log[AQri2|xi; 90)], 

and we would use this in a standard MLE analysis (conditional on x;). 

b. First, we know that, for all (yi2,x;), 8. minimizes E[0;1(®)|vi2, x;]. Because rz is a 
function of (yi2,x;), 

Efrata (@)y2, Xx] = ri2E[lin(®)[vi2, xi]; 

because 712 > 0, 8, maximizes E[7i20;1(@)|yi2, xi] for all (v2, X;), and therefore 0, maximizes 
E[ri20i1(0)] by iterated expectations. Similarly, 8, maximizes E[0;1(6)], and so it follows that 
0 maximizes E[r20;,(@) + 0;2(0)]. For identification, we have to assume or verify uniqueness. 

c. The score is 

s;:(0) = ri28ii(8) + s2(0), 


where $;1(8) = Volj1(0)' and s;2(0) = Voli2(@)'. Therefore, 


E[s:(8.)8i(80)'] = Elri2si(8.)si(80) | + E[si2(8.)si2(80) ] 
+ E[ri2si(0.)8i2(00) ] + E[ri2si2(@.)si(O0) J. 


Now by the usual conditional MLE theory, E[s;(6,)|yvi2,x;] = 0 and, since r;2 and s,2(@) are 
functions of (y2, X;), it follows that E[728;1(@, )si2(8.) |vi2, xi] = 0, and so its transpose also 
has zero conditional expectation. As usual, this implies zero unconditional expectation. We 


have shown 
E[s;(8.)si(8o)'] = Elri2si1(80)8i1 (80) ] + E[8i2(0.)8i2(80) J. 
Now, by the unconditional information matrix equality for the density h(y2|x;@), 


E[si2(8.)si2(80) ] = —E[H2(®)], 
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where Hj2(0.) = Vosi2(0). Further, by the conditional IM equality for the density 
gvilv2,x; 8), 
E[s1(8.)si1(0.) y2, x] = -EHA (0,) v2, xi], (13.102) 
where H;1(8,) = Vosii(8). Since rp is a function of (yi2,x;), we can put r» inside both 
expectations in (13.102). Then, by iterated expectations, 
E[ri2si(8,)s:(8.)'] = -E[r2Hi(@,)]. 
Combining all the pieces, we have shown that 


E[s:(80)si(80)'] = -ElraHa(@.)] - E[H2(8%)] 
= —{E[ri2Vesii (0) + Vosi2(8)]} 
= —E[V9li(0) = -E[H;(0)]. 


So we have verified that an unconditional IM equality holds, which means we can estimate the 
asymptotic variance of yN (6 - 8.) by estimating {-E[H;,(0)]}~'. 
d. From part c, one consistent estimator of Avar[ /N (6 —6,)] is 
N 
Nt S Cna + Aa), 
i=1 
where the notation should be obvious. In some cases it may be simpler to use an expected 
Hessian form for each piece, which we can obtain by looking for consistent estimators of 
—E[ri2H(0,)| and -E[H 2(0%)]. By definition, A2(0%) = —E[Hi2(8.)|x;], and so 
E[A2(6.)| = —E[Hi2(8.,)]. By the usual law of large numbers argument, 
N 
N17 Ap > -E[H2(0.)). 
i=1 


Similarly, since A4 (00) = —E[Hi(8.)|vi2,x:], and rz is a function of (v2, x;), it follows that 
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E[r2Ai(8,)] = —E[ri2Hi1(0,)]. Under general regularity conditions, M~! EL roAa 
consistently estimates —E[7 2H; (0, )|. This completes what we needed to show. 

Interestingly, even though we do not have a true conditional maximum likelihood problem, 
we can still use the conditional expectations of the Hessians — but conditioned on different sets 
of variables, (y;2,x;) in one case, and x; in the other — to consistently estimate the asymptotic 
variance of the partial MLE. 

e. (Bonus Question) Show that if we were able to use the entire random sample, the 
resulting conditional MLE would be more efficient than the partial MLE based on the selected 
sample. 

Solution 

We use a Standard fact about positive definite matrices: if A and B are P x P positive 
definite matrices, then A — B is p.s.d. if and only if B-' — A™ is p.s.d. Now, as we showed in 
part d, the asymptotic variance of the partial MLE is {E[72A;1(@,) + Ai2(8.)]}. If we could 
use the entire random sample for both terms, the asymptotic variance would be 
{E[Aii(8,) + A2(00)]} ~. But 

E[Ai(9.) + A2(00)] — E[r2An(8.) + A2(00)] = E[(1 —r2)An(o)], 
which is p.s.d. because A; (9%) is p.s.d. and 1 — r > 0. Intuitively, the larger is P(r = 1) the 
smaller is the efficiency difference. 


13.8. a. This is similar to Problem 12.12 for nonlinear regression; here we are specifying a 


full conditional distribution. We can use the results in Section 12.4.2: 


Avar[ VN (6—6,)] = Az! (B, + FoC.F),) Ao" 
= Aj’ + AF C,F,A7t 
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where C, =E[ri(y,)ri(y,)']. We also use the information matrix equality, Ao = Bo, where 
Ao = E{s;(8.;y,)si(8.;7,)] = —E[Hi(8.;7,)] 
and 


si(0;y) = Vo logifly |x: g(w;,y);8)]' 
H;(8.;7,) = Vosi(8;7). 


To use the formula we need to characterize F». First, 

Vysi(O;7) = Vy{Vologify xi, g(w;,¥)39)]'}. 
which generally requires using the chain rule to compute. Write 

k(y, x, g;0) = Velogify,|x:, g(w,,¥);)]’. 

Then 

V78i(8;y) = Vek(y;,x:,8(wi,¥),9)Vyg(w;,¥) 
and 

F, = E[Vek(y;, Xi, 8(Wi,7,),80)Vyg(w,-¥,)] 

b. Generally, 
Aval VN (6-@,)] = Âz +A, ÊC RÂ, 


where Â, is one of the various choices for estimating the information matrix, evaluated at 6 


and ¥, 


N 
C= N? $ rw, Pew, 


i=1 


and 
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N 
Ê = NN) Vek(y,,x:,2(wi,7), ÈV yg(w, DI. 


i=1 
c. It applies directly where the scalar po plays the role of y,. The score for this problem 


(with respect to 0) is 


yee o(xiB + pgily)) x; maa 
s:(0; y) {D(x iB + pgi(y))[1 -PaB + pgi(y))]} ( are J O(xiB + pgi(y))], 


where gi(y) = hi — ziy. The full Jacobian of s;(0; y) with respect to y is complicated, but it is 


easy to see it has the form 
Vys:(0;y) = L(xi,Zi,:; 8, p) vi — P:P + pgily))] 


o Bre o e oe a eee 
+P: TOBE pe- DaB + pay)? ( ily) Jesas La 


When evaluated at the true values 0, and yv, the the first term has zero expectation conditional 
on (X;, Zi, hi) because 
EQ:lX;, Zi, hi) = P:P, + Pogily,)) 


So F, can be estimated by plugging in the estimators and averaging the second term across i. 


d. When po = 0, the second term in Vys;(0%;Y,) is zero, and so 
E[Vy8i(9057,,)|Xi,Zi, hi] = 0, 
which means condition (12.37) holds and F, = 0. This implies, from part a, 
Avar[,/N (6—0,)] = Ast. 
e. Because po is an element of 0,, for testing Ho : po = 0 we can ignore the fact that y, 


has been estimated in the first stage. In other words, when we run probit of y; on x;, £i, 
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i = 1,...,N, where g; = hi — z;¥, we can use a standard probit ¢ statistic on gj. 


13.9. a. Under the Markov assumption, the joint density of (vio,...,yir) is given by 


SrQrlyra) + fraQralyr-2) fiyo) + foo), 


so we would need to model fo(yo) to obtain a model of the joint density. 
b. The log likelihood 
T 
00) = $ loglfi(ridvie-138)] 

t=1 
is the conditional log-likelihood for the density of (vit, ..., yir) given yio, and so the usual 
theory of conditional maximum likelihood applies. In practice, this is MLE pooled across i 
and ¢. 

c. Because we have the density of (vi1,..., vir) given yio, we can use any of the three 
asymptotic variance estimators implied by the information matrix equality. However, we can 
also use the simplifications due to dynamic completeness of each conditional density. Let 
$i(8) = Vo log|Avilvir1; 9), Hi(8) = Vosi(@) and Ai(0,) = —E[Hi(8)|yir-1],¢ = 1,... T. 
Then Avar /N (6 — 0,) is consistently estimated using the inverse of any of the three matrices 
in equation (13.50). If we have a canned package that computes a particular MLE, we can just 
use any of the usual asymptotic variance estimates obtained from the pooled MLE. 

13.10. a. Because of conditional independence, the joint density [conditional on (x, c)] is 


the product of the marginal densities [conditional on (x, c)]: 


S0 1,92; tee »Va|x, c) = MZe elx, c). 


b. Let g(vi, ...,¥G|x) be the joint density of y; given x; = x . Then 


Biss Vols) = f Avuy yal hle)de. 
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c. The density g(yi, ..., vg|x) is now 
£015». Val ¥,250) = | ornya-- vals c; 7, )A(c|x; 6. )de 


G 
=| | [Wels cya(elx;80)de, 
Rt 


and so the log likelihood for observation 7 is 
logig(vi, tee alXi3 Yo 5.) ] 


G 
= los J | Lexi. c; Y8)h(c|xi; o )de | 
R 
g=1 


d. This setup has some features in common with a linear SUR model, although here the 
correlation across equations is assumed to come through a single common component, c. 
Because of computational issues with general nonlinear models — especially if G is large and 
some of the models are for qualitative response — one probably needs to restrict the cross 


equation correlation somehow. 


13.11. a. For each ¢ > 1, the density of yi given yi1 = Vit-1,Vie-2 = Ve-25,---,¥io0 = Yo and 
ci; = cis 
fiQyes,c) = (2202) exp[-Gr - pyri — €)?/(202)]. 


Therefore, the density of (vii,..., vir) given yio = yo and c; = c is obtained by the product of 


these densities: 
TIP, (2702)? exp[-0: - pym — ¢)?/(202)]. 


If we plug in the data for observation i and take the log we get 
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Pe 
> (1/2) log(o2) - Qu - pyi - €1)?/(202)} 


t=1 


T 
= ~(7/2)log(o2) - $ 10 — pyiet ~ €1)?/(20%), 


1 
where we have dropped the term that does not depend on the parameters. 
It is not a good idea to “estimate” the c, along with the p and 02, as the incidental 
parameters problem causes inconsistency — severe in some cases — in the estimator of p. 
b. If we write c; = do + @1yi + a;, under the maintained assumption, then the density of 


(va, weil) given (vio = 0,4; = a) is 


T 
[ [Qro exp 0, = pyr = ao ~ ayo - a)/(262)], 


1 
Now, to get the density condition on vio = yo only, we integrate this density over the density of 
a; given y = yo. But a; and yw are independent, and a; ~ Normal(0,o2). So the density of 


(Vi, -.- Vir) given yo = Vo is 


T 
f (Tle? Wr = Pyri — Go = aio 0202) Jo sla 


If we now plug in the data (vio, vit, ..., vir) for each i and take the log we get a conditional 
log-likelihood (conditional on yo) for each i. We can estimate the parameters by maximizing 
the sum of the log-likelihoods across i. 

c. As before, we can replace c; with a, + @1yio + a;. Then, the density of yi; given 


(Vit, -<< Vi1,VYi0, 4i) is 


Normal] pyj.-1 + @o + @1Vi0 + ai + (do + 1yo + aiy- 02], 


t = 1,...,7. Using the same argument as in part b, we just integrate out a; to get the density 
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of (vi, ...,vir) given yo = yo: 


T 
J (Tene en (r= pyr ~ do = ayin = a — 5(ao + Ay + on 7Qe8] Jol 
Kel 


Numerically, this could be a difficult MLE problem to solve. Assuming we can get the MLEs, 
we would estimate p + SE(c;) as p + (ĉo + G10), where Yo is the cross-sectional average of 
the initial observation. 


d. The log likelihood for observation i, now conditional on (70, z;), is the log of 


T 
f (Teredo ans ZitB — do — 1y — Ži 0708] Joa 
“eM Hl 


The assumption that we can put in the time average, Z;, to account for correlation between c; 
and (yio, Z;), may be too strong. It may be better to put in the full vector z;, although this leads 
to many more parameters to estimate. 
13.12. a. The first order conditions can be written as 

N 

S xabi- maÂ] = 0,7 = 1,...,K. 

i=1 
If, say, the first element of x; is unity, x; = 1, then the first entry of the FOC is 

N 
Sb: -m(x;,0)] =0 


i=1 


or 


N 
` uj = 0. 
i=1 


b. For the Bernoulli QLL with mean function A(x@) = exp(x@)/[1 + exp(x@)] it is easily 
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seen that the FOC is 
N 
dD xili- AGB] = 0,7 = 1,...K. 
i=l 
and so the canonical mean function is the logistic function. Therefore, g(u) = A™ (u), and to 
find A~! (u) we need to solve for z as a function of u in 
u = exp(z)/[1 + exp(z)] = 1/[exp(—z) + 1]. So 


exp(-z) = + -1= Cow 


or 


est 
a-u 


exp(z) = 
Now just take the log to get z = log[y/(1 — u)]. 
c. Generally, the FOC for the Poisson QMLE has the form 
N 
>| Vom(xi,6)'[yi — m(xi, 6)\/m(xi,6) = 0 
i=1 
and, with m(x,0) = exp(x8), we get 
N 
>| exp(x:8)x/[yi — exp(xi6)]/exp(x,6) = 0, 
i=l 
or 
N 
X xib: —exp(x;0)] = 0. 
i=1 
So m(z) = exp(z) is the canonical mean function and its inverse is g(u) = log(). 


d. This is a true statement. The score for observation 7 has the form 
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s:(0) = x;[vi — m(xi,8)], 
and therefore the Hessian, —x;V9m(x;,) — which has the form —r(x;, 0)x,x; for some function 
r(+) > 0 -— does not depend on y;. If 0* is the plim of 6 whether or not the mean is correctly 
specified, then a consistent estimator of —E[H(x;, 8*)] is 
N 
N1 dir, ĝÔ)x x; = 0. 
i=1 
By contrast, with any other mean (link) function, the Hessian depends on y;, and the estimators 
based on E[H(x;, 6,)|x;] under the assumption the mean is correctly specified are generally 
inconsistent. 
13.13. In fact, there is nothing special about the QMLE setup for this problem: the 
conclusion holds for M-estimation. It is instructive to see the general argument. 
To prove the result for general M-estimation (whether a minimization or maximization 


problem), use a mean value expansion and multiply through by N12: 


N22 gfw) = N22 gw," )+ Ory ds (6) ‘am (6 - 6*) 


i=1 i=1 


where s;(@) is the P x 1 score and 6 is on the line segment between 6 and 0*. By Lemma 12.1, 


N 
N+ $ s(6) > E[si(0*)] 


i=1 
because 6 4 0*. Under the regularity conditions in Theorem 12.3, E[s;(8.)] = 0, and so 


N1 o s(6) 5 0. We also know /N (6 - 0*) = O,(1), and so 
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N N N 
N2 X q(wi ô) = N! $ q(wi,8*) + op(1) + Op(1) = NH2 X q(wi,8*) + op (1). 
i=1 i=1 


i=1 i= = 
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13.14 (Bonus Question). Let {f(v:|x:; 9) : t = 1,...,7) be a sequence of correctly specified 
densities for yx given xy. That is, assume that there is 0, € int(@) such that f(y,|x,; 6.) is the 
density of y; given X; = x;. Also assume that {x : t = 1,2,...,7} is strictly exogenous for 
each t: D(vilXa,...,Xir) = D(vi|Xiz). 

a. It is true that, under the standard regularity conditions for partial MLE, that 
E[s;(0.)|Xi1,...,X:r] = 0, where s;(8,) = Ve log fi(vilxiz3 8)'? 

b. Under the assumptions given, is {s;(@,) : £ = 1,..., Ty necessarily serially uncorrelated? 

c. Let c; be “unobserved heterogeneity” for cross section unit i, and assume that, for each ¢, 

D(yilZa, ...,Zi7, ci) = D(vilZit, ci) 
In other words, {zi; : t = 1,..., Ty is strictly exogenous conditional on c;. Further, assume that 
D(cilza,...,Zir) = D(cilZ:), 
where Z; = T-'(zi1 +...+z;r) is the vector of time averages. Assuming that well-behaved, 
correctly-specified conditional densities are available, how do we choose x;; to make part a 
applicable? 

Solution 

a. This is true because, by the general theory for partial MLE, we know that 
E[si(8,)|xie] = 0, t = 1,...,7. But if D(vilXi,...,Xir) = DiX) then, for any function 
mi(vit,Xit), E[M: Oit Xit) |X, -.. .Xir] SELM: (Vit Xiz)|Xiz], including the score function. 

b. No. Strict exogeneity and complete dynamic specification of the conditional density are 
entirely different. Saying that D(vi|xi, ...,x;r) does not depend on x;;, $ + t, says nothing 
about whether yp, r < t, appears in D(vi|Xi, Vit-1, Xir-15---» Vi, Xi1). Of course it is possible (if 


unlikely) for the score to be serially uncorrelated without complete dynamic specification, but 
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that is still a separate issue from strict exogeneity. 
c. We take Xy = (Zis Zi), t = 1,..., T. If g:0:lZ: c; Yy) is correctly specified for the density of 
Vie given (Zy = Zs, Ci = c), and h(c|z; 8) is correctly specified for the density of c; given Z; = Z, 
then the density of yx given z; is obtained as 
Filvtss0o) = f gizi cY hlez 8o )vdo) 
and this clearly depends only on (Z;s, Z;). In other words, under the assumptions given, 
D(vilza,..., Zir) = Dvielzz,Z,),¢ = 1,...,T 
which implies 
D(vilXa,...,Xir) = Divalxiz), t = 1,..., T. 
Incidentally, we have not eliminated the serial dependence in {y;;} after only conditioning on 
(Zin Zi): the part of c; not explained by Z; affects y; in each time period. 
13.15 (Bonus Question). Consider the problem of estimating quantiles in a parametric 
context. In particular, write 


Y =o +X, +u 
D(u|x) = Normal (0, o3 exp(2xy,)) 


This means that æo + xB, = EQ|x) = Med(y|x). 

a. For 0 < t < 1 let ņ, be the t” quantile in the standard normal distribution. (So, for 
example, 7.95 = 1.645.) Find Quant,(y|x) in terms of 77, and all of the parameters. When is 
Quant ,(y|x) a linear function of x? 

b. Given a random sample of size N, how would you estimate Quant ,(y|x) for a given t? 


c. Suppose we do not assume normality but use the weaker assumption that u/[0 o exp(xy,)] 
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is independent of x. Can we consistently estimate Quant,(y|x) in this case? 
Solution 


a. First note that 
Quant, (y|x) = a, + xB, + Quant ,(u|x). 
Let r = u/[o, exp(xy,)], so that r is independent of x with a Normal(0, 1) distribution. Because 


u has a strictly increasing cdf conditional on its quantile g-(x) is the unique value such that 
Plu < qzx(x)|x] = 7, 


or 


p| — < 1a X| =T 
Oo eXpP(xY,) — Oo exp(xy,) 


or 


Pfr < 9X) J =T. 
Oo Exp(Xxy,) 


Because r is independent of x, its t” quantile conditonal on x is n+. Therefore, we must have 


qz(x) =n 
Toexp(xy) 


or 
q:(X) = N00 exp(xy,). 
So we have derived 
Quant, Q|X) = æo + xB, + 770. exp(xy,). 


Quant ,(y|x) is linear in x for t =.5 because then 7, = 0. It is also linear in x for any t if 


Y, = 9, in which case it can be written as 
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Quant, (|x) = a. + xB, + 700. 
Of course if y, = 0 then u and x are independent, and the quantile functions for different t are 
parallel lines with different intercepts, a5 + 47100. 
b. Because we have specified 
D(y|x) = Normal(a, + xB,,,0% exp(2xy,)) 
we can use maximum likelihood to estimate all parameters, given a random sample of size N. 


Then 


Quant, (x) = â + xB + n-6-exp(x9). 
c. The quantile function still has the form 
Quant, |x) = @ + xB, + 770. exp(xy,,) 
but we must treat 7, as an unknown parameter because we do not know the distribution of r. 
Note that the distribution of r may be asymmetric. The key restriction is that D(7|x) does not 
depend on x. 
We know n+ is the t” quantile of the random variable r. If we observed 4r; : i = 1,...,N} 


we could estimate n+ as the t” sample quantile of the r;. Instead, we can use the standardized 


residuals 


a ti _ Wi-G-xiB) 
tp = So ne SS a 
Go exp(x:7) Go exp(xi7) 


and compute #, as the t” sample quantile of {7; : i = 1,...,N+. Because + solves the 


problem 
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N 
min om CHAS T); 
T isl 


where c;(-) is the “check” function defined in Section 12.10, we can conclude #7, is generally 
consistent using the consistency result for two-step M-estimators in Section 12.4. Of course, 
we have to have consistent estimators of the other parameters. From the results of Gourieroux, 
Monfort, and Trognon (1984a), the normal QMLE is generally consistent for æo, B,, Go, andy, 
even if normality does not hold. (As usual, we would need to use a sandwich covariance matrix 
estimator for inference on these parameters.) Obtaining a valid standard error for #,, and then 
getting the joint variance-covariance matrix of all parameter estimators, is challenging. 


Probably the nonparametric bootstrap is valid. 
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Solutions to Chapter 14 Problems 

14.1. a. The simplest way to estimate (14.35) is by 2SLS, using instruments (x, x>). 
Nonlinear functions of these can be added to the instrument list, and they would generally 
improve efficiency if yz + 1. If E(u3|x) = o3, 2SLS using the given list of instruments is the 
efficient, single equation GMM estimator. If there is heteroskedasticity an optimal weighting 
matrix that allows heteroskedasticity of unknown form should be used. Finally, one could try 
to use the optimal instruments derived in section 14.4.3. Even under homoskedasticity, these 
are difficult, if not impossible, it find analytically if y2 + 1. 

With y2 > 0, equation (14.35) is suspect as a structural equation because it is a linear 
model, and generally there are outcomes where x282 + y3y1 + u2 < 0. 

b. No. If yı = 0 the parameter y2 does not appear in the model. Of course, if we knew 
yı = 0, we would consistently estimate 6; by OLS. 


c. We can see this by obtaining E(1|x): 


E(vi|x) = x181 + y1E(7 |x) + E(wi|x) 
= x11 + y1EQ3 lx). 


Now, when y: # 1,E(y}’|x) + [E(v2|x)]”2, so we cannot write 
E(v1|x) = x16) + ¥1(x62)”?; 
in fact, we cannot find E(;|x) without more assumptions. While the regression y, on x2 


consistently estimates 5, the two-step NLS estimator of y; on xj, (xô 2)%2 will not be 
consistent for 6; and y2. (This is an example of a “forbidden regression,” which we discussed 
in Chapter 9.) When y2 = 1 and we impose this in estimation, we obtain the usual 2SLS 


estimator. 
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14.2. a. When pı = 1, we obtain the level-level model, hours = —y1 + 2161 + yiwage + u1. 
Using the hint, let pı — 0 to get hours = zıðı + yi log(wage) + u1. 

b. We cannot use a standard ¢ test after estimating the full model (say, by nonlinear 2SLS), 
because pı cannot be estimated under Ho. The score test and QLR test also fail because of lack 
of identification under Ho. What we can do is fix a value for pı — essentially our best guess — 
and then use a ¢ test on (wage?! — 1)/p, after linear 2SLS estimation (or GMM more 
generally). This need not be a very good test for detecting yı + 0 if our guess for p1 is not 
close to the actual value. There is a growing literature on testing hypotheses when parameters 
are not identified under the null. 

c. If Var(w1|z) = oł, use nonlinear 2SLS, where we would use z and functions of z as IVs. 
If we are not willing to assume homoskedasticity, GMM is generally more efficient. 

d. The residual function is r(@) = [hours — 2161 — y1(wage?! — 1)/p1], where 
6 = (81,71, p1)'. Using the hint the gradient is 

Vor(@) = {-21,-(wage”! — 1)/p1, y1[(wage’! — 1) — piwage” log(wage)]/p7}. 
The score is just the transpose. 

e. Estimate 6; and yı by 2SLS, or use the GMM estimator that accounts for 

heteroskedasticity, under the restriction p, = 1. Suppose the instruments are z;,a 1 x L vector. 


This is just linear estimation because the model is linear under Ho. Then, taking Z; = z;, and 


(0) = [hours; — 2181 — ¥1(wage; — 1)] 
Vori(6) = (—zi,—(wage; - 1),71[(wage; — 1) — wage; log(wage;)]), 


use the score statistic in equation (14.32). 


14.3. Let Z7 be the G x G matrix of optimal instruments in (14.57), where we suppress its 
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dependence on x;. Let Z; be the G x L matrix that is a function of x; and let £, the probability 
limit of the weighting matrix. Then the asymptotic variance of the GMM estimator has the 
form (14.10) with G, =E[Z}R.(x;)]. So, in (14.48) take A = GG, and 

s(w;) = G,=.Zir(w,,8,). The optimal score function is s*(w,) = Ro(x;)'Qo(xi) 'r(wi, 80o). 


Now we can verify (14.51) with p = 1: 


G,E.E[Zir(w,,8,)r(w;,8,) Qo(x:) 1Ro(x:)] 
GLE .E[Z;E{r(w;,0,)r(w;,8,) |xi} Q(x) 1Ro(x:)] 
G)E,E[Z}Q,(x/)Q,(x;) 'Ro(x;)] = GhE,G, = A. 


E[s(w1)s*(w1)'] 


[1] 


14.4. a. The residual function for the conditional mean model E(y;|x) = m(x;, B,) is 
ri(B) = vi — m(x;,B). Then Q,(x;) in (14.55) is just a scalar, Q,(x;) = Var(yi|xi) = @o(Xx:). 
Under WNLS.3, @.(xi) = o3h(xi,7,) for a known function /(-). Further, 
R,(x;) =E[Veri(Bo)|xi] = —Vpm(x:,B,,), and so the optimal instruments are 
Vpm(X;,B,,)/@o(x;). The asymptotic variance of the efficient IV estimator is obtained from 
(14.60): 

{E[Vpm(xi,B,)'[o(xi)]'Vem(xi,B,)]}* = o3{E[Vpm(xi,B,)' Vema B VAY), 
which is the asymptotic variance of the WNLS estimator under WNLS.1, WNLS.2, and 


WNLS.3. 


b. If Var(yi|x;) = o2 then NLS achieves the efficiency bound, as it seen by setting 


h(x,y,) = 1 in part a. 
c. Now let ra(0) = ui(B) = yi —m(x;,B) and r2(B,o7) = [yi — m(x;,B)]* — o°. Let r;(0) 


denote the 2 x 1 vector obtained by stacking the two residual functions. Then the moment 


conditions can be written as 
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E[r;(9,)|xi] = 0, 


where 0, = (B),02)'. To obtain the efficient IVs, we first need E[Voer;(8.)|x;]. But 


—Vpm;(B) 0 


Vori(@) = —2Vgm,(B)ui(B) —1 


Evaluating at 0, and using E[w;(B,,)|xi] = 0 gives 


-V Mi 0 
R.(x:) = Ver;(0) = pmi(B) 
0 -1 
We also need 
o2 E(u} |x;) 


Ae EONS ec igi cena 


where u; = yi — m(x;,B,). The optimal IVs are [Q.(x;)]-'Ro(x:). If E(u}|x;) = 0, as occurs 
under conditional symmetry of u;, then the asymptotic variance matrix of the optimal IV 
estimator is block diagonal, and for 6 it is the same as NLS. In other words, adding the moment 
condition for the homoskedasticity assumption does not improve efficiency over NLS under 
symmetry, even if E(uf|x;) is not constant. But there is something subtle here. the NLS 
estimator is efficient in the class of estimators that only uses information on the first two 
conditional moments. If we use the information E(u?|x;) = 0 then, in general, we could do 
better. But, of course, such an estimator would be less robust than NLS. 

If, in addition, E(u?|x;) is constant, then the usual estimator of o2 based on the sum of 
squared NLS residuals is efficient (among estimators that only use the first two conditional 
moments, but it happens that E(w}|x;) = 0 and E(u?|x;) is constant). 


14.5. We can write the unrestricted linear projection as 
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Vie = T+ XM + Vi, t = 1,2,3 
where 7; is 1+ 3K x 1, and then m is the 3 + 9K x 1 vector obtained by stacking the z;. Let 


0 = (y, A1, A2,A3, B)’. With the restrictions imposed on the m, we have 


To = W,t = 1,2,3, mi = [Mi +B), A3,A5] 
m2 = [A1, (A2 + B)',A5]', m3 = [A1 A2, (As +B) T 


Therefore, we can write n = H9 for the (3 + 9K) x (1 + 4K) matrix H defined by 


1 0 0 0 O 
0 Ik 0 0 Ik 
0 0 Ik 
00 0 Ik 
1 0 
H- 0 Ik 
0 Ik 0 Ik 
0 Ik 
1 0 
0 Ik 0 
0 0 Ik 
0 0 0 Ik Ik 


14.6. By this hint, it suffices to show that 
[Avar /N (6 — 8,)]-! — [Avar /N (6 - 0.)] 7t 
is p.s.d. This difference is H,=>'H, — HAH, = H),(&>' — A7')Ho. This is positive 


semi-definite if 25'— A;' is p.s.d., which again holds by the hint because A, — Eis assumed 


to be p.s.d. 


14.7. With h(®) = H90, the minimization problem becomes 
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min (7 —H@)'= (%— HO), 
OcR? 


where it is assumed that no restrictions are placed on 0. The first order condition is easily seen 


to be 
- 2H'Ê™ (î - HO) = 0or(H/= MO=HE +. 


Anl: ; . = : 
Therefore, assuming H'= H is nonsingular — which occurs w.p.a.1. when H'2;'H — is 


A 


nonsingular — we have 6 = (H'= H) An'S R. 

14.8. From the efficiency discussion about maximum likelihood in Section 14.4.2, it is no 
less asymptotically efficient to use the density of (vio, vit, ...,ir) than to use the conditional 
distribution (vi1,..., vir) given yio. The cost of the asymptotic efficiency is that if we 
misspecify fo(70; 0), then the unconditional MLE will generally be inconsistent for @,. The 
MLE that conditions on y; is consistent provided we have the densities f;(y;|v+-1; 8) correctly 
specified, ¢ > 1. As f:(v~ıly-1;0) is the density of interest, we are usually willing to put more 
effort into testing our specification of it. 

14.9. We have to verify equations (14.49) and (14.50) for the random effects and fixed 
effects estimators with . The choices of $j, s; (with added i subscripts for clarity), A1, and A» 
are given in the hint. Now, from Chapter 10, we know that E(r;r}|x;) = 0217 under RE.1, 


RE.2, and RE.3, where r; = v; — Aj,vi. Therefore, 
E(sasy) = E(X,ri/X;) = o2E(X;X;) = o7Ay 
by the usual iterated expectations argument. This means that, in (14.49), p = oł. Now, we just 


need to verify (14.50) for this choice of p. But sj}; = Xjur!X; and, as described in the hint, 


Xr, = Šv; = Aj TVi) = Xv; = Xi(ejir + u;) = Šu;. 
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Therefore, sps4 = Xrir/X; and so 
E(s28)|x;) = X/E(rir'lx;)X; = 02X/X; 
It follows that 
E(s2s1) = X:E(riri[xi)X; = EX) 
Finally, XX; = XX; — Aj Xi) = XX; = XK and so E(s;28;,) = o2E(X;X;), and this 
verifies (14.50) with p = o2. 
14.10. a. For each t we have 


E(valxi) = E(qic; + vali) = E(ici|xi) + E(uilx,) 
= nNE(ci|x;) + E(uis|x;) 
= 442 040'S0 


because E(c;|x;) = 0 and E(u;|x;) = 0 (because E(uj:|x;,c;) = 0). 


b. Under the assumptions — which are the same as Assumptions RE.1 and RE.3 — we know 


that 
Varig) = o2, t = 1,...,T 
Cov(ci,uiz) = 0,t = 1,...,T 
Cov(ujs,uis) = 0, t # 8. 
Therefore, 


Var(vi) = Var(n:Ci + ui) = n202 + 2N;Cov(ci, ui) + 02 


DED 2 
= NOC + Oy 
and, for t + s, 


Cov(vit, Vis) = COV(N:Ci + Uit NsCi + Uis) 
= Cov(n:Ci, Nsci) + COV(N:Ci, Uis) + COV(NsCi, Uit) + COV(Uit, Uis) 
= nmsCov(ci,ci) = Nso? 
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c. The usual RE estimator treats the 7; as constant (which can then be normalized to be 
unity). In other words, it uses a misspecified model for Q = Var(v;). As we discussed in 
Chapter 10, the RE estimator is still consistent and yN -asymptotically normal, and we can 
conduct inference using a robust variance matrix estimator. 

A more efficient estimator is, of course, FGLS with the correct form of the 


variance-covariance matrix. Write the 7 time periods for draw i as 


y, = XB+vi 
E(v;|x;) =0 


where the ¢” row of X; is xx. The T x T variance-covariance matrix (which is also the 


conditional on x;) is 


ope nor 120% 5 nro? 
2 22 2 2 
N20¢ NIO tOn °°" 12 TO ¢ 
Var(vi) = : , , : > 
2 2 2 <2 2 
NTO¢ NTN 20¢ e TOG + Oy 


where we impose the normalization 7; = 1. The GLS estimator is 


N lr N 
Bors = D X;Q°X; D X;Q"y, 
i=l i=1 


Of course, we would have to estimate o2, oł, and 72,...,7. One way to approach estimation 


of the variance-covariance parameters is to write 


Vitis = NiNsO2 + dO? + Fits 
E(Fits) = 0 


for all £s = 1,2,..., T, where d;s is a dummy variable equal to one if t = s, and zero otherwise. 


Then we can estimate the parameters by pooled NLS after replacing viv, with vi:7;,, where the 
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Vir are perhaps the RE residuals (or they could be the POLS residuals). Note that 71 = 1 is 
imposed. Then we can form Ô and then use FGLS. 


14.11. First estimate initial parameters n,from a set of linear reduced-form equations: 
Y; = Xino +U; 
where 1, is K x 1 and unrestricted. Then estimate 2, by, say, system OLS. Or, if we assumed 


Var(u;|x;) = Q, then FGLS would be no less asymptotically efficient. 


Given ft and Ê consistent for 
=, = Avar[/N (t — 1.) ], 


the CMD estimator of 8,, 6, solves 


min [%—¢(6)]'= [â — g(0)], 
0O 


which is algebraically equivalent to a weighted multivariate nonlinear least squares problem 
where 7 plays the role of the K x 1 vector of “dependent variables.” As discussed in the case 
where g(8) is linear, the asymptotic analysis of the CMD estimator is different from the 
standard WMNLS problem: here K is fixed. 


After estimation, Avar(6) is estimated as 


where G = Vog(6). 
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Solutions to Chapter 15 Problems 

15.1. a. Because the regressors are all orthogonal by construction — that is, dk; »«dm; = 0 
for k + m, and all i — the coefficient on dm; is obtained from the regression y; on 
dm;,i = 1,...,N. But this is easily seen to be the fraction of ones in the sample falling into 
category m (because it is the average of y; over the observations from category m). Therefore, 
the fitted value for any iis the cell frequency for the appropriate category. These frequencies 
are all necessarily in [0,1]. 

b. The fitted values for each category will be the same. If we drop d1; but add an overall 
intercept, the overall intercept is the cell frequency for the first category, and the coefficient on 
dm; becomes the difference in cell frequencies between category m and category one (the base 
category), m = 2, ..., M. 

15.2. a. First, because utility is increasing in both c and q, the budget constraint is binding 
at the optimum: c; + pigi = mi. Plugging c = m; — piq into the utility function reduces the 


problem to 
max (mi — piq) + ajlog(1 + q). 
q2 


Define utility as a function of q, as 


Il 


si(q) = (mi — piq) + ailog(1 + q). 


Then, for all g > 0, 


The optimal solution is g; = 0 if the marginal utility of charitable giving at g = 0 is 


nonpositive, that is, if 
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dsi 
dq 


(0) = -pi +a; < O orai < pi. 

(This can also be obtained by solving the Kuhn-Tucker conditions.) Thus, for this utility 
function, a; can be interpreted as the reservation price above which no charitable contribution 
will be made; in other words, we have the corner solution q; = 0 whenever the price of 
charitable giving is too high relative to the marginal utility of charitable giving. On the other 
hand, if a; > p; then an interior solution exists (q; > 0) and necessarily solves the first order 


condition 


dsi Vly 
dq (qi) Pit 


or 
1 + Qi = ajlpi. 
b. By definition of y;, y; = 1 if and only if a;/p; > 1 or log(aj/p;) > 0. If 


ai = exp(z/y + v;), the condition for y; = 1 is equivalent to z;y + v; — logp; > 0. Therefore, 


PQ; = 1]Z;,mi,pi) = P: = 1]z:,pi) 
= P(ziy + vi —logp; > Olz: pi) = Plvi/o > (—ziy + logp;)/o] 
= 1-G{(-27 + logp;)/o] = Gl(zy — logpi)o}, 


where the last equality follows by symmetry of the distribution of v;/o. 
15.3. a. If PQ; = 1|z1,22) = O(z181 + Y1Z2 + 7223) then 


OP(y = 1|z1,22) 


ae = (y1 + 2y2z2) + O(z181 + ¥1z2 +7225); 


for given z, the partial effect is estimated as 


(1 + 2P2z2) + G(a181 + F1z2 + 9223), 
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where, or course, the estimates are the probit estimates. 


b. In the model 


PQ; = 1]Zi,Z2,d1) = DB(z181 + ¥1Z2 + Y2d1 + ¥3Z2d1), 
the partial effect of z2 is 
OP(y = 1|zZ1,22,d 
Poe Buen) = (yı + 73d1) © (Z101 + 7122 + Yod1 + 73221). 


The effect of dı is measured as the difference in the probabilities at dı = 1 andd; = 0: 
PO = 1)z,d1 = 1) — PQ = 1|z,d1 = 0) = (27161 + v2 + (y1 + ¥3)Z2) — (z181 + 7122). 
Again, to estimate these effects at given z and — in the first case, dı — we just replace the 


parameters with their probit estimates, and use average or other interesting values of z. 
c. If the estimated partial effect is for particular values of (z1,z2,d1), for example, 
(71 + Y3d{) + O(2981 + 7123 + Padt + Y3z%d?), 
then we can apply the delta method from Chapter 3 (and referred to in Part HI). Thus, we 
would require the full variance matrix of the probit estimates as well as the gradient of the 
expression of interest, such as (y1 + 2y2z2) » @(z181 + 71Z2 + 7223), with respect to all probit 
parameters. Alternatively, the bootstrap would be simply but require a bit more computation. 
If we are interested in the average partial effect (APE) of dı going from zero to one then 
we estimate it as 
N 
N* S [oza + (Fit F3)z2 + 2) - (2081 + F1z2)], 
i=l 
that is, we estimate the effect for each unit i and then average these across all i. If we want a 


standard error for this, we would use the extension of the delta method worked out in Problem 
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12.17 — to account for the averaging as well as estimation of the parameters. The bootstrap can 
be used, too. 

d. (Bonus Question) For a fixed value of z2, say z$, how would you estimate the average 
partial effect of dı on the response probability? 

Solution 

Now we average out only with respect to Z7: 

N 
N! $ [Oza + (P1 + 3)z3 + f2) — O@inds + F123). 
i=1 
We can then vary z$ to see how the effect of changing dı from zero to one varies with z5. 
Again, we can use Problem 12.17 to obtain an asymptotic standard error. 

15.4. This is the kind of (nonsense) statement that arises out of failure to distinguish 
between the underlying latent variable model and the model for P(yv = 1|x). To compare the 
LPM and probit on equal footing, we must recognize that the LPM assumes P(y = 1|x) = xy 
while the probit model assumes that P(y = 1|x) = ®(xB). So the substantive difference is 
purely in the functional forms for the response probabilities. And the probit functional form 
has some attractive properties compared with the linear model: ®(xB) is always between zero 
and one, and the marginal effect of any x; is diminishing after some point. The LPM and probit 
models are both approximations to the true response probability, and the LPM has some 
deficencies for describing the partial effects over a broad range of the covariates. 

If one insists on focusing on normality of the latent error in the probit case then one must 
compare that assumption with with the the corresponding assumption for the LPM. If we 
specify a latent variable as y* = xy + e then the LPM is obtained when e has a uniform 


distribution over [—a, a] for some constant 0 < æ < œ. For most purposes, this is much less 
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plausible than the normality underlying probit. 


15.5. a. If Py = 1|z,q4) = B(z181 + y1Zz2q) then 


Piyy=1 
o (y a Iz, q) — vig ° (2181 + ¥1Z2q), 


assuming that z2 is not functionally related to z1. 

b. Write y* = z161 + r, where r = y1z2q + e, and e is independent of (z,q) with a standard 
normal distribution. Because q is assumed independent of z, g|z ~ Normal(0,y7z5 + 1); this 
follows because E(7|z) = y1z2E(q|z) + E(elz) = 0. Also, 

Var(r\z) = y7z5Var(q|z) + Var(elz) + 2y1z2Cov(q,elz) = yiz5 +1 
because Cov(q,e|z) = 0 by independence between e and (z,q). Thus, 7/,/vyiz3 +1 hasa 


standard normal distribution independent of z. It follows that 


PỌ = 1Jz) = (2181/7323 +1 ). (15.97) 


c. Because P(y = 1|z) depends only on y2, this is what we can estimate along with 6;. (For 
example, yı = —2 and yi = 2 give exactly the same model for P(y = 1|z).) This is why we 
define pı = yj. Testing Ho : pı = 0 is most easily done using the score or LM test because, 
under Ho, we have a standard probit model. 

Let 5; denote the probit estimates under the null that pı = 0. Define ¢; = (zi181), 

Ô; = (z;181), “i = yi- Ô;, and a; = û;/ JÔ;(1 — Ô;) (the standardized residuals). The 
gradient of the mean function in (15.97) with respect to 61, evaluated under the null estimates, 
is simply d iZi1. The only other quantity needed is the gradient with respect to pı evaluated at 


the null estimates. But the partial derivative of (15.97) with respect to p1 is, for each i, 
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- (2181 )(23/2)(p1z3 + 1)? 6(zad/ driz +1 ) 
When we evaluate this at pı = 0 and 51 we get ~(2:181) (z3,/2) i. Then, the score statistic can 
be obtained as NR? from the regression 


PiZi (2:181)29¢i ; 


Ui on = = ; = = ; 
(60-6) /6-46) 


under Ho, NR2 ~ 73. 

d. The model can be estimated by MLE using the formulation with p1 in place of y4. It is 
not a standard probit estimation but a kind of “heteroskedastic probit.” 

15.6. a. What we would like to know is that, if we exogenously change the number of 
cigarettes that someone smokes per day, what effect would this have on the probability of 
missing work over a three-month period? In other words, we want to infer causality, not just 
find a correlation between missing work and cigarette smoking. 

b. Since people choose whether and how much to smoke, we certainly cannot treat the data 
as coming from the experiment we have in mind in part a. (That is, we cannot randomly assign 
people a daily cigarette consumption.) It is possible that smokers are less healthy to begin with, 
or have other attributes that cause them to miss work more often. Or, it could go the other way: 
cigarette consumption may be related to personality traits that make people harder workers. In 
any case, cigs might be correlated with the unobservables in the equation. 


c. If we start with the model 
PO = 1\z,cigs,qi) = ®(zi181 + yicigs + q1), (15.98) 


but ignore qı when it is correlated with cigs, we will not consistently estimate anything of 


interest, whether the model is linear or nonlinear. Thus, we would not be estimating a causal 
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effect. If gi is independent of cigs, the probit ignoring qı does estimate the average partial 
effect of another cigarette. 

d. No. There are many people in the working population who do not smoke. Thus, the 
distribution (conditional or unconditional) of cigs piles up at zero. Also, since cigs takes on 
integer values, it cannot be normally distributed. But it is really the pile up at zero that is the 
most serious issue. 

e. Use the Rivers-Vuong test. Obtain the residuals, 72, from the regression cigs on z. Then, 
estimate the probit of y on z1, cigs, 72 and use a standard ż test on 72. This does not rely on 
normality of rz (or cigs). It does, of course, rely on the probit model being correct for y under 
Ho. 

f. Assuming people will not immediately move out of their state of residence when the state 
implements no smoking laws in the workplace, and that state of residence is roughly 
independent of general health in the population, a dummy indicator for whether the person 
works in a state with a new law can be treated as exogenous and excluded from (15.98). (These 
situations are often called “natural experiments.”) Further, cigs is likely to be correlated with 
the state law indicator because since people will not be able to smoke as much as they 
otherwise would. Thus, it seems to be a reasonable instrument for cigs. 

15.7. a. The LPM estimates, with the usual and heteroskedasticity-robust standard errors, 
are given below. Interesting, the robust standard errors on the non-demographic variables are 
often notably smaller than the usual standard errors. The statistical significance of the OLS 
coefficients is the same using either set of standard errors. 

When pcnv goes from .25 to .75, the estimated probability of arrest falls by about .077, or 


7.7 percentage points. 
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use grogger 


gen arr86 = 


0 


replace arr86 = 1 if narr86 > 0 
(755 real changes made) 


reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 


Source 


Model 


44.9720916 
500.844422 2716 


Number of obs = 2725 
F( 8, 2716)= 30.48 
Prob > F = 0.0000 
R-squared = 0.0824 
Adj R-squared = 0.0797 
Root MSE 42942 


avgsen 
tottime 
ptime86 
inc86 
black 
hispan 
born60 
_cons 


- .1543802 
. 0035024 
- .0020613 
- .0215953 
- .0012248 
. 1617183 
. 0892586 
. 0028698 
. 3609831 


reg arr86 pcnv avgsen tottime ptimes6 


Linear regression 


black hispan born6o, 


- ,.1954275 - .1133329 

- .0089326 .0159374 

- .0116466 .007524 

- .0303561 - ,0128344 

- .0014738 - .0009759 
.1156299 . 2078066 

. 0489454 . 1295718 

- .0308539 . 0365936 

. 329428 . 3925382 

robust 

Number of obs = 2725 
F( 8, 2716)= 37.59 
Prob > F = 0.0000 
R-squared = 0.0824 
Root MSE 42942 


pcnv 
avgsen 
tottime 
ptimes6 
inc8s6 
black 
hispan 
born60 
_cons 


- .1543802 
. 0035024 
- .0020613 
- .0215953 
- .0012248 
. 1617183 
. 0892586 
. 0028698 
. 3609831 


df MS 
8 5.62151145 
. 184405163 
. 20037317 
Std. Err t 
.0209336 -7.37 
.0063417 0.55 
. 0048884 -0.42 
.0044679 -4.83 
.000127 -9.65 
.0235044 6.88 
.0205592 4.34 
.0171986 0.17 
.0160927 22.43 
inc86 
Robust 
Std. Err. t 
.018964 -8.14 
.0058876 0.59 
.0042256 -0.49 
.0027532 -7.84 
.0001141 -10.73 
.0255279 6.33 
.0210689 4.24 
.0171596 0.17 
.0167081 21.61 


- .1915656 
- .0080423 
- .010347 
- .0269938 
- .0014487 
1116622 
.0479459 
- .0307774 
. 3282214 


-.1171948 
.0150471 
. 0062244 

- .0161967 
- .001001 
2117743 
. 1305714 

-036517 
. 3937449 


di .5*_b[pcnv] 


-.0771901 


b. The robust statistic and its p-value are gotten by using the “test” command after 
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appending “robust” to the regression command. The p-values are virtually identical. 


qui reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 


test avgsen tottime 
( 1) avgsen = 0 
( 2) tottime = 0 
F( 2, 2716) = 
Prob > F = 


qui reg arr86 pcnv 


test avgsen tottime 
( 1) avgsen = 0 
( 2) tottime = 0 
F( 2, 2716) = 
Prob > F = 


0.18 
0.8360 


0.18 
0.8320 


c. The probit model estimates follow. 


probit arr86 pcnv avgsen tottime ptime86 


Probit regression 


Log likelihood = -1483.6406 


avgsen tottime ptime86 inc86 black hispan born6o, 


Number of obs 


robust 


inc86 black hispan born60 


avgsen 
tottime 
ptimes6 
inc8s6 
black 
hispan 
born60 
_cons 


- .5529248 
.0127395 
- .0076486 
- .0812017 
. 0046346 
. 4666076 
. 2911005 
.0112074 
- .3138331 


.0720778 
0212318 
. 0168844 
.017963 
.0004777 
.0719687 
. 0654027 
. 0556843 
.0512999 


LR chi2(8) 

Prob > chi2 = 

Pseudo R2 
P>|z| [95% Con 
0.000 - .6941947 
0.548 - .028874 
0.651 - .0407414 
0.000 -.1164085 
0.000 - .0055709 
0.000 . 3255516 
0.000 . 1629135 
0.840 - .0979318 
0.000 - .4143791 


- .4116549 
. 0543531 
. 0254442 

- .0459949 

- .0036983 
.6076635 
. 4192875 
. 1203466 
- .213287 


Now, we must compute the difference in the normal cdf at the two different values of pcnv, 


black = 1, hispan = 0, born60 = 1, and at the average values of remaining variables. 


sum avgsen 


Variable 


avgsen 
tottime 


tottime ptime86 inc86 


| 2725 
| 2725 


. 6322936 
. 8387523 


3.508031 
4.607019 
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12 
541 


ptimes6é | 2725 . 387156 1.950051 (0) 
inc86 | 2725 54.96705 66.62721 (0) 
di normal(_b[_cons] + _b[pcnv]*.75 + _b[avgsen]*.6322936 
+ _b[tottime]*.8387523 + _b[ptime86]* .387156 + _b[inc86]*54.96705 
+ _b[ black] + _b[born60] 
- normal(_b[_cons] + _b[pcnv]*.25 + _b[avgsen]*.6322936 
+ _b[tottime]*.8387523 + _b[ptime86]* .387156 + _b[inc86]* 54.96705 
+ _b[ black] + _b[born60]) 


~~ 


- . 10166064 

This last command shows that the probability falls by about .102, which is somewhat larger 
than the effect obtained from the LPM. 

d. To obtain the percent correctly predicted for each outcome, we first generate the 


predicted values of arr86 as described on page 465: 


predict PHIhat 
(option pr assumed; Pr(arr86)) 


gen arr86t = PHIhat >= .5 


tab arr86t arr86 


| arr86 
arrs6t | 0 1 | Total 
O | 1,903 677 | 2,580 
1|] 67 78 | 145 
Total | 1,970 755 | 2,725 
di 1903/1970 
. 96598985 
. di 78/755 
. 10331126 


. di (1903 + 78)/2725 
. 72697248 


For men who were not arrested, the probit predicts correctly about 96.6% of the time. 
Unfortunately, for the men who were arrested, the probit is correct only about 10.3% of the 
time. The overall percent correctly predicted is pretty high — 72.7% — but we cannot very well 
predict the outcome we would most like to predict. 


e. Adding the quadratic terms gives 
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probit arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 
pcnvsq pt86sq inc86sq 


Probit regression Number of obs = 2725 
LR chi2(411) = 336.77 
Prob > chi2 = 0.0000 
Log likelihood = -1439.8005 Pseudo R2 = 0.1047 
arrs6 | Coef Std. Err Z P> |Z | [95% Conf. Interval 
eal a ieee pe, ey a +--------------------------------------------------------------- 
pcnv | .2167615 . 2604937 0.83 0.405 - .2937968 . 7273198 
avgsen | . 0139969 .0244972 0.57 0.568 - .0340166 .0620105 
tottime | -.0178158 .0199703 -0.89 0.372 - .056957 0213253 
ptime86 | . 1449712 . 1438485 5.18 0.000 . 4630333 1.026909 
inc86 | -.0058786 . 0009851 -5.97 0.000 - .0078094 - .0039478 
black | - 4368131 .0733798 5.95 0.000 . 2929913 . 580635 
hispan | . 2663945 .067082 3.97 0.000 . 1349163 . 3978727 
born60 | -.0145223 0566913 -0.26 0.798 - .1256351 .0965905 
penvsq | -.8570512 2714575 -3.16 0.002 -1.389098 - .3250042 
pt86sq | -.1035031 0224234 -4.62 0.000 -.1474522 - .059554 
inc86sq | 8.75e-06 4.28e-06 2.04 0.041 3.63e-07 .0000171 
_cons | - .337362 0562665 -6.00 0.000 - .4476423 - .2270817 


Note: 51 failures and © successes completely determined. 


test pcnvsq pt86sq inc86sq 


( 1) pcnvsq = 0 
( 2) pt86sq = 0 
( 3) inc86sq = 0 
chi2( 3) = 38.54 
Prob > chi2 = 0.0000 


The quadratics are individually and jointly significant. The quadratic in pcnv means that, at 
low levels of pcnv, there is actually a positive relationship between probability of arrest and 
pcnv, which does not make much sense. The turning point is easily found as 


.217/(2(.857)) *.127, and there are many cases — 1,265 — where pcnv is less than . 127. 


sum pcnv 
Variable | Obs Mean Std. Dev. Min Max 

a eae ee rd at ee as +-------------------------------------------------------- 
pcnv | 2725 .3577872 . 395192 0 1 


count if pcnv < .127 
1265 


15.8. a. The following Stata session answers this part. The difference in estimated 


probabilities of smoking at 16 and 12 years of education is about —. 080. In other words, for 
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non-white women at the average family income, women with 16 years of education are, on 


average, about eight percentage points less likely to smoke. 


use bwght 
gen smokes = cigs > 0 
tab smokes 
smokes | Freq Percent Cum 
O | 1,176 84.73 84.73 
1 | 212 15.27 100.00 
Total | 1,388 100.00 
probit smokes motheduc white lfaminc 
Probit regression Number of obs a 1387 
LR chi2(3) = 92.67 
Prob > chi2 = 0.0000 
Log likelihood = -546.76991 Pseudo R2 = 0.0781 
smokes | Coef Std. Err Zz P>|z | [95% Conf. Interval 
fee at sh at ah nie a “Se Te +--------------------------------------------------------------- 
motheduc | -.1450599 .0207899 -6.98 0.000 - . 1858074 - .1043124 
white | .1896765 . 1098805 1.73 0.084 - .0256853 . 4050383 
lfaminc | -.1669109 . 0498894 -3.35 0.001 - .2646923 - .0691296 
_cons | 1.126276 . 2504611 4.50 0.000 . 6353817 1.617171 
sum faminc 
Variable | Obs Mean Std. Dev Min Max 
See eee ee ee +-------------------------------------------------------- 
faminc | 1388 29.02666 18.73928 .5 65 


di normal(_b[_cons] + _b[motheduc]*16 + _b[1faminc]*1log(29.02666) ) 
- normal(_b[_cons] + _b[motheduc]*12 + _b[1faminc]*1log(29.02666) ) 


- .08020112 


b. The variance faminc is probably not exogenous because, at a minium, income is likely 


correlated with quality of health care. It might also be correlated with unobserved cultural 


factors that are correlated with smoking. 


c. The reduced form equation for /faminc is estimated below. As expected, fatheduc has a 


positive partial effect on /faminc, and the relationship is statistically significant. We need the 
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residuals from this regression for part d. We lose 196 observations due to missing data on 


fatheduc, and one observation has already been lost due to a missing value for motheduc. 


reg lfaminc motheduc white fatheduc 


Source 


Model 


467.690904 1187 


140.936735 


3 46.9789115 


motheduc 
white 
fatheduc 
_cons 


.0709044 
. 3452115 
.0616625 
1.241413 


. 394010871 

.511451797 

Std. Err t 
.0098338 7.21 
.050418 6.85 
.008708 7.08 
. 1103648 11.25 


predict v2hat, 


resid 


(197 missing values generated) 


d. To test the null of exogeneity, we estimate the probit that includes 1»: 


probit smokes motheduc white lfaminc v2hat 


Iteration 
Iteration 
Iteration 
Iteration 
Iteration 


BRWNEFO 


log likelihood 
log likelihood 
log likelihood 
log likelihood 
log likelihood 


Probit regression 


Log likelihood = -432.06242 


= -471.77574 
= -432.90303 

-432.0639 
-432 . 06242 
= -432.06242 


motheduc 
white 
lfaminc 
v2hat 


- .0826247 
.4611075 
- . 7622559 
.6107298 
1.98796 


.0465204 -1. 
. 1965245 2. 
. 3652949 -2. 
3708071 1. 
. 5996374 3. 


Number of obs = 1191 
F( 3, 1187) = 119.23 
Prob > F = 0.0000 
R-squared = 0.2316 
Adj R-squared = 0.2296 
Root MSE = 6277 
P>|t | [95% Conf. Interval 
0.000 .0516109 .090198 
0.000 . 2462931 . 4441298 
0.000 .0445777 .0787473 
0.000 1.024881 1.457945 
Number of obs = 1191 
LR chi2(4) = 79.43 
Prob > chi2 = 0.0000 
Pseudo R2 = 0.0842 
P>|z| [95% Conf. Interval 
0.076 - .173803 . 0085536 
0.019 .0759265 . 8462886 
0.037 -1.478221 - .046291 
0.100 - . 1160387 1.337498 
0.001 .8126927 3.163228 


There is not strong evidence of endogeneity, but the sign of the coefficient on v2hat is what 


we expect: unobservables that lead to higher income are positively correlated with unobserved 
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factors affecting birth weight. There is a further problem in that using this test presumes 
fatheduc can be omitted from the birth weight equation. Remember, the test can be interpreted 
as a test for endogeneity of /faminc only when we maintain that fatheduc is exogenous. 

Because of the potential endogeneity of this is perhaps not a very good example, but it 
shows you how to mechanically carry out the tests. 

Incidentally, the probit coefficients on /faminc are very different depending on whether we 
treat it as exogenous or not. This is true even if we use the same samples, as the Stata output 
below shows. The APE is probably quite different, too. It is hard to know what to do in such 


cases (which are all too common). 


probit smokes motheduc white lfaminc if v2hat != . 


Probit regression Number of obs = 1191 
LR chi2(3) = 76.72 

Prob > chi2 = 0.0000 

Log likelihood = -433.41656 Pseudo R2 = 0.0813 
smokes | Coef Std. Err Z P>|z | [95% Conf. Interval 

oe Se pk, iy +--------------------------------------------------------------- 
motheduc | -.1497589 .0225634 -6.64 0.000 -.1939823 -.1055355 
white | . 2323285 .137875 1.69 0.092 - .0379015 . 5025584 
lfaminc | -.1719479 .0687396 -2.50 0.012 - . 306675 - .0372207 
_cons | 1.133026 . 2990124 3.79 0.000 .5469727 1.71908 


15.9. a. Let P = 1|x) = xB, where xı = 1. Then for each i, 


0(B) = yilog(x:ß) + (1 —y;) log(1 — x:B), 
which is only well-defined for 0 < x;B < 1. 
b. For any possible estimate B, the log-likelihood function is well-defined only if 
0< x;ĝ < 1 foralli = 1,...,N. Therefore, during the iterations to obtain the MLE, this 
condition must be checked. It may be impossible to find an estimate that satisfies these 


inequalities for every observation, especially if N is large. 
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c. This follows from the KLIC, and the discussion of Vuong’s model selection statistic in 
Section 13.11.2: the true density of y given x — evaluated at the true values, of course — 
maximizes the KLIC. Because the MLEs are consistent for the unknown parameters, 
asymptotically the true density will produce the highest average log-likelihood function. So, 
just as we can use an R-squared to choose among different functional forms for E(y|x), we can 
use values of the log-likelihood to choose among different models for P(y = 1|x) when y is 
binary. 

15.10. a. There are several possibilities. One is to define p; = P(y = 1|x;) — the estimated 
response probabilities — and obtain the square of the correlation between y; and p;. For the 
LPM, this is just the usual R-squared. For the general index model, G(x;B) is the estimate of 
E(y|x;), and so it makes sense to compute an analogous goodness-of-fit measure. This 
measure is always between zero and one. 

An alternative is to use the sum of squared residuals form. While this produces the same 
R-squared measure for the linear model, it does not for nonlinear models. 

b. The Stata output below gives the square of the correlation between y; and the fitted 
probabilities for the LPM and probit. The LPM R-squared is about .106 and that for probit is 
higher, about .115. So probit is preferred based on this goodness-of-fit measure, although the 


improvement is not overwhelming. (It is about an 8.5% increase in the R-squared.) 


reg arr86 pcnv avgsen tottime ptime86 inc86 black hispan born60 
pcnvsq pt86sq inc86sq 


Source | SS df MS Number of obs = 2725 
ee ere F( 11, 2718) = 29.27 
Model | 57.8976285 11 5.26342077 Prob > F = 0.0000 
Residual | 487.918885 2713 .179844779 R-squared = 0.1061 
-------------+------------------------------ Adj R-squared = 0.1025 
Total | 545.816514 2724 . 20037317 Root MSE = 42408 

arrs6 | Coef Std. Err. t P>|t | [95% Conf. Interval 
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avgsen 
tottime 
ptimes6 
inc8s6 
black 


born60 
pcnvsq 
pt86sq 
inc86sq 


.075977 
.0012998 
- .0022213 
. 1321786 
- .0018505 
. 1447942 
. 0803938 
- .0062993 
- . 2456865 
- .0139981 
3.31e-06 
. 363352 


. 0803402 
.0062692 
. 0048287 
.0230021 
.0002737 
. 0233225 
.0204959 
.0170252 
.0812584 
.0020109 
1.09e-06 
.0175536 


| 
| 
| 
| 
| 
| 
hispan | 
| 
| 
| 
| 
| 


_cons 


probit arr86 pcnv avgsen tottime ptime86 
pcnvsq pt86sq inc86sq 


Probit regression 


Log likelihood = -1439.8005 


arrs6 | Coef Std. Err 
sm la ed Sls” a la ea “et +----------------------------- 
pcnv | .2167615 . 2604937 
avgsen | .0139969 .0244972 
tottime | -.0178158 .0199703 -0. 
ptime86 | . 1449712 . 1438486 
inc86 | -.0058786 .0009851 -5, 
black | . 4368131 .0733798 
hispan | . 2663945 .067082 
born60 | -.0145223 .0566913 -0. 
penvsq | -.8570512 .2714575 -3. 
pt86sq | -.1035031 .0224234 -4. 
inc86sq | 8.75e-06 4.28e-06 
_cons | - . 337362 .0562665 -6. 


95 0.344 - .0815573 . 2335112 
.21 0.836 - .0109932 .0135927 
46 0.646 - .0116897 .007247 
75 0.000 .0870752 .177282 
76 0.000 - .0023872 - .0013139 
21 0.000 .0990627 . 1905258 
92 0.000 . 0402047 . 1205829 
37 0.711 - .039683 .0270843 
02 0.003 - .4050211 - .0863519 
96 0.000 - .0179411 - .0100551 
03 0.002 1.17e-06 5.45e-06 
70 0.000 . 3289323 .3977718 
inc86 black hispan born60 
Number of obs = 2725 
LR chi2(11) = 336.77 
Prob > chi2 = 0.0000 
Pseudo R2 = 0.1047 
Z P>|z| [95% Conf. Interval 
83 0.405 - .2937968 . 7273198 
57 0.568 - .0340166 . 0620105 
89 0.372 - .056957 .0213253 
18 0.000 . 4630332 1.026909 
97 0.000 - .0078094 - .0039478 
95 0.000 . 2929913 . 580635 
97 0.000 . 1349163 . 3978727 
26 0.798 -.1256351 .0965905 
16 0.002 -1.389098 - .3250042 
62 0.000 -.1474522 - .059554 
04 0.041 3.63e-07 .0000171 
00 0.000 - .4476423 - .2270817 


Note: 51 failures and © successes completely determined. 


predict PHIhat 
(option pr assumed; Pr(arrs6) ) 


corr PHIhat arrs6 


(obs=2725) 


PHIhat 
arrs6 


. di .339642 
. 11532816 


1.0000 
0.3396 


1.0000 


15.11. We really need to make two assumptions. The first is a conditional independence 
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assumption: given x; = (Xi,...,Xi7), (Vi, ..-, vir) are independent. This allows us to write 


JOYn, Virlki) = flx) fr 7lxi), 


that is, the joint density (conditional on x;) is the product of the marginal densities (each 
conditional on x;). The second assumption is a strict exogeneity assumption: 
D(vir|x:) = D(&virlxir),t = 1,..., T. When we add the standard assumption for pooled probit — 
that D(vi7|xir) follows a probit model — then 
T 
Jory) = | [CRDP - Cap, 
t=1 


and so pooled probit is conditional MLE. 


15.12. We can extend the T = 2 case used to obtain equation (15.81): 


Pa = 1X; Ci Ni = 1) = Pa = 1,ni = 1x; ci) P(ni = 1X; ci) 
= Piva = Ly = 0, y3 = 0x; c) Pa = 1, y2 = 0, y = Olx:,c:) 
+P(ya = 0,yz = 1, ya = 0x; ci) +PQa = 0,y2 = 0, y3 = 1x;,c;)} 


Now, we just use the conditional independence assumption (across ¢) and the logistic 


functional form: 


Pa = 1, yn = 0, via = Olx; ci) = A(xaß + cD [1 — A(x2ß +c;)| ° [1 — A(xisB + ci) | 
Pa = 0, yn = 1,y = Olx; c;) = [1 = A(xaß + ci) |ACx2B + Ci) ° [1 — A(xzß + ci) | 


and 
Pa = 0,yi2 = 0,yi3 = 1]xi,¢;) = [1 -— ACxnB + c:)] - [1 — ACB + ci) A (XB + c)]. 
Now, the term 
1/{[1 + exp(xaß + c;)] + [1 + exp(xi2B + c;)] - [1 + exp(xiaB + c;)]} 


appears multiplicatively in both the numerator and denominator, and so it disappears. 
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Therefore, 


Pa = 1)xi,ci,n; = 1) = exp(xnB + c;)/[exp(xiB + ci) + exp(xi2B + c;) + exp(xiaB + c:)] 
exp(xi1B)/Lexp(xiB) + exp(xi2B) + exp(xisB)]. 


Also, 
Po = 1x; cni = 1) = exp(Xi2B)/[exp(xiB) + exp(xi2B) + exp(xisB)] 


and 


Pz = 1x; cani = 1) = exp(xisB)/[exp(xirB) + exp(xi2B) + exp(xisB)]. 
Incidentally, a consistent estimator of B is obtained using only the n; = 1 observations and 
applying conditional logit, as described in Chapter 16. This approach would be inefficient 
because it does not use the n; = 2 observations. 
A similar argument can be used for the three possible configurations with n; = 2, which 


leads to the log-likelihood conditional on (x;,n;), where c; has dropped out. For example, 


exp[(xi1 + Xi2)B] 


PYa = Lye = What = 2) = a xe)p) + lexp(xa + xn)B) + expla + xn) 


15.13. a. If there are no covariates, there is no point in using any method other than a 
straight comparison of means — in particular, the difference-in-differences approach described 
in Section 6.5.2. The estimated probabilities for the treatment and control groups, both before 
and after the policy change, will be identical to the sample proportions regardless of the model 
we use. 

b. Let d2 be a binary indicator for the second time period, and let dB be an indicator for the 


treatment group. Then a probit model to evaluate the treatment effect is 


P(y = 1x) = ©(59 + 61d2 + 52dB + 63d2 + dB + xY), 
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where x is a vector of covariates. We would estimate all parameters from a probit of y on 
1,d2,dB,d2 + dB, and x using all observations. Once we have the estimates, we need to 
compute the “difference-in-differences” estimate, which requires either plugging in a value for 


x, Say X, or averaging the differences across x;. In the former case, we have 


A 


T PAE = LIGI + 014 ô> H ô» H xy) (59 33 x7) | 
— [P(o + 61 + Xf) — P(o + ¥9)], 


and in the latter we have 


N 
TAPE = Nt S Koo t ôi t ô> t ô3 t x7) O(do t ô> t xi) | 


i=1 


~[O(b9 + 61 + xÑ) — O(6 +x]. 


Probably ĉî4pz is preferred as it averages each of the estimated “treatment effects” — see 
Chapter 21 — across all units. 

c. We would have to use the delta method to obtain a valid standard error for either îp4g or 
T ape, With the latter using the extension in Problem 12.17. 


15.14. a. First plug in for y2 from (15.40): 


[z181 + y2Z201 +u1 > 0] = 1[z181 + (262 + v2)Z201 + uı > 0] 


1 
= 1[z161 + (262)Z201 +uU,+V2Z201 > 0] 


Given the assumptions, uı + v2Z201 has a mean zero normal distribution conditional on z. Its 


variance is 
Var(uy + ¥2Z204|Z) = 1 + 2712201 + 75(Z201)* 


where 71; =Cov(v2,u1) and t =Var(v2). So we can write 
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P _ 1|z) -1-0 —[z16) + (252)Z201 | 
J1 + 2izZ201 + 73(Z201)? 
= G5 2101 + (282)Z201 
J1 + 2niz201 + 73(z201)? 
which is a heteroskedastic-probit model (but not with exponential heteroskedasticity in the 
latent error). 
b. This two-step procedure is inconsistent because the response probability P(vi = 1|z) 


does not have the usual probit form 
[zð F (zò2)Z201]. 


Under the assumptions given, the first-stage estimation of 52 is not the problem: OLS is 
consistent. It is the misspecified functional form in the second stage that causes the problem. 
c. A control function method works nicely here. Scaled coefficients are easily estimated 
and then 6; and a can be recovered using the same approach in Section 15.7.2. In addition, 
average partial effects are easily estimated after control function estimation. 
Under (15.40), independence, and bivariate normality, we can write uw; as in equation 


(15.42) and then substitute: 


yı = 1[z161 + V2Z204 +0ıv2 +e > 0] 


eılZ,y2,v2 ~ Normal(0, 1 — pî) 
Following the same argument in Section 15.7.2 we have 
Pi = 1]Z,y2, v2) = B(Z18 p1 + y2Z220p1 + 0 p1V2) 


where 851 = 81/(1 — p3) "?, @p1 = a1/(1 — pj)”, and 0 p1 = 01/(1 — pt)". Therefore, the 


following two-step CF method — which extends Procedure 15.1 — consistently estimates the 
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scaled parameters: (i) Regress y2 on z and obtain the OLS residuals, Ŷ2. (ii) Run a probit of 1 
on Z1, V2Z2, and V2. 


Letting B, = (81,0;)' and B,, = B,/(1 — pî), we use exactly the same unscaling of the 


parameters as before. Namely, 
Bi = B,,/(1 + 62,73)! 


where t3 = Var(v2). The estimator in equation (15.45) can still be used. 
The approach to estimating the APEs follows directly from the estimator of the average 
structural function in equation (15.47). Allowing for the interactions, 
ASE(z1,2) = N1 p2 (218 p1 + y2Z20p1 + 6 p12). 
i=1 
As usual, we can take derivatives or changes with respect to the elements of (zi, v2) to obtain 
estimated APEs. 

15.15. a. This example falls into the situation described below equation (12.41). Namely, 
the scores from the two optimization problems are uncorrelated. This follows because the first 
problem — OLS regression of y; on z; — depends only on the random draws (z;, y:2). In the 
second stage, we are estimating a model for /(vi|v2,z). Letting s;(y,;52) denote the score for 
the second-step MLE — with respect to y} — $:(y,;52) is uncorrelated with any function of 
(Zi, Viz) because E[s;(y ,;52)|Zi,i2] = 0. (I do not use a separate notation for the true values of 
the parameters.) 

So that we can apply the results from Section 12.4.2 directly, we set the problem up as a 


minimization problem. Then, from the usual score formula for the probit model, we have 
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Si(y,352) = — (wii (82)y,)[1 — (wi (52)y,)] 


where wi1(62) = [Xi, Vi2(62)] and vj2(82) = yi2 — z;ð2. The expected Hessian (that is, the 
expected Jacobian of s;(y,;62) with respect to y,) has the usual form for binary response with a 


correctly specified response probability: 


ee ef Wit (52)'War (è) [O(wi (52 )¥ ,) 1° } 
i P(w (82)y,)[1 - ®(wa (82)y;)] 


Next, we need F = E[V3,8;(y,;52)]. Like the Hessian in the usual binary response model, 
the Jacobian Vs,s;(y;;82) is complicated. But its expectation is not. Using the fact that 


Elva — ®(wi (82)Y,)|Zi,¥i2] = 0 it is easy to show 


= wii(82)'b(wi1(52)7,) | 
BS ed O(wii(82)y,)[1 — P(walè2)y,)] Va Owa 6a) } 


__9 Ef wii (82) zi (wa (82)¥,) } 
P | D(wa (82) [1 — wa (82)y,)] 


Finally, we need a first-order representation for the OLS estimator, ô»: 
N 
JN (82 - 82) = AFN? È ziva + 0p(1), 
i=1 
where A> = E(zjz;). It follows that the matrix in the middle of the sandwich is 


D = Var{s;(y,;52) + FA3'zvi2} 
= Var[s;(y,;52)] + FA3'Var(zivi2)Ao'F’ 
= Ai +73FA)'F' 


because s;(y,;52) and Z;vņ are uncorrelated, Var[s i(¥,;62)] = Ax by the information matrix 


equality, and E(v3,|z;) = 73 under homoskedasticity for v2. (The results that follow do not rely 


in any crucial way on E(v|z;) = 73; we could just drop that and use the more general 
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formula.) Therefore, 
Avar[ JN (¥,-7,)] = Ai'(Ar+73FAQ'F')A7" 
= Aj) +735A;7'FA5'F’Az!. 
It is easy to construct consistent estimators of each part using sample averages and plugging in 
the consistent estimators. 

b. If we ignore estimation of 52 we act as if Avar[./N (7, —¥,)] is just Aj/, the inverse of 
the information matrix from the second stage problem. But the correct matrix differs from A7' 
by 73A7_FAj'F' Aj", which is positive semi-definite (and usual positive definite if 0,1 + 0). 

c. In Problem 12.17 we can take g(w;,0) = ®(wii(52)y, )y,, but we have to be careful in 
choosing the “score” with respect to 8. The same argument as in Problem 12.17 gives us 

N N 
N25 g(w,,6) = N2 $ g(w,,0) + G/N Ô- 6) + 0p(1) 
i=1 i=l 
where G = E[Vog(wi,8)]. For this application, 
Vog(wi,8) = [P(wa (82)y, Ik + (Wi (52)7, )¥ Wii 52) |O(Wir (52)¥,)¥, V3, Wi (82) ] 
where K; is the dimension of xı and V3,wii(82)' = [0|-z;]. To get a representation for 
JN (6 — 0) we stack the first-order representations obtained in part a: 


N _—Arlle fay + -1,/,,. 
JN (6-8) = ye a Aj [si(y,;52) + FA3 ZiVi2] ) ast) 


=i 
i=1 A? Z;vū 


N 
= N2 È e:(0) + o(1). 


i=1 


Then, from Problem 12.17, 


C = Avar[/N Â, -n,)] = Var[g(w:, 0) - n, - Ge:(0)] 
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d. A consistent estimator of the asymptotic variance in part c is 


where &, = ®(WiY,)7,, 
N 
@=N D Owa Ikn + dW, NA Waldir, TV swal2)'] 


i=1 


and 


a -lra an-1 [A 
2 —A; [S; + FA, Zvi] 
€e; = 1 
pests ah 
A> Z;Vi2 
The score §; and Hessian A, are estimated as usual for a probit model (but with minus signs) 


and 


NA Or Nazon?) 
> OWay dL = (wa?) 


15.16. a. The response probability is 


p(x) = 1- [1 + exp(xB)]™ 
and, using the chain rule, 


p(x) 


PO? = aBjexp(xB)[1 + exp(xp)] = = 08; exp(xB) 
K 


[1 + exp(xB)]** 


Of course, we get the logit partial effect as a special case when a = 1. 
b. The log likeihood has the usual form for a binary response. Let 
G(x,0) = 1- [1+ exp(xB)|~%, so 1 — G(x, ®) = [1 + exp(xB)|~*. Without making the 


distinction between generic and “true” values, 
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(P,a) = -(1 —yi)alogl1 + exp(xiB)] + yilog{1 — [1 + exp(xiB)} “>. 

c. The Stata output is given below. Given the estimated value of a, @ = 413,553, the model 
does not seem well determined. (Remember, the logit model imposes a = 1.) The logit 
estimates are included for comparison. The B j are are all the same sign and of roughly the 
same Statistical significance across the two models. The ¢ statistic for Ho : log(a) = 0 is very 


small, about .02. 


scobit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 


Skewed logistic regression Number of obs S 753 
Zero outcomes = 325 
Log likelihood = -399.5222 Nonzero outcomes = 428 
inlf | Coef Std. Err Z P>|z | [95% Conf. Interval 
See ae aaah fe ed a a T A +--------------------------------------------------------------- 
nwifeinc | -.0148532 .0056874 -2.61 0.009 - .0260003 - .0037061 
educ | . 1512102 .0277346 5.45 0.000 .0968514 . 2055689 
exper | .139092 .020757 6.70 0.000 .0984091 .1797749 
expersq | - .002257 . 0006377 -3.54 0.000 - .0035069 - .0010072 
age | -.0587203 . 0089444 -6.57 0.000 -.076251 - .0411897 
kidslt6 | -.9977924 . 1426425 -7.00 0.000 -1.277367 - . 7182183 
kidsge6 | .0257666 .045345 0.57 0.570 - .0631079 . 1146411 
_cons | -13.09326 666.1339 -0.02 0.984 -1318.692 1292.505 
/lnalpha | 12.93254 666.1327 0.02 0.985 -1292.663 1318.529 
alpha | 413553.1 2.75e+08 0 
Likelihood-ratio test of alpha=1: chi2(1) = 4.49 Prob > chi2 = 0.0342 


Note: likelihood-ratio tests are recommended for inference with scobit models 


logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 


Logistic regression Number of obs = 753 
LR chi2(7) = 226.22 
Prob > chi2 = 0.0000 
Log likelihood = -401.76515 Pseudo R2 = 0.2197 
inlf | Coef Std. Err Z P>|z | [95% Conf. Interval 
eae, a eS a rll vy, ee a +--------------------------------------------------------------- 
nwifeinc | -.0213452 .0084214 -2.53 0.011 - .0378509 - .0048394 
educ | .2211704 . 0434396 5.09 0.000 . 1360303 . 3063105 
exper | . 2058695 .0320569 6.42 0.000 . 1430391 . 2686999 
expersq | -.0031541 .0010161 -3.10 0.002 - .0051456 - .0011626 
age | -.0880244 .014573 -6.04 0.000 -.116587 - .0594618 
kidslt6 | -1.443354 . 2035849 -7.09 0.000 -1.842373 -1.044335 


kidsge6 | .0601122 .0747897 0.80 0.422 - .086473 . 2066974 
_cons | . 4254524 . 8603697 0.49 0.621 -1.260841 2.111746 


d. The likelihood ratio statistic for Ho : a = 1, reported in the Stata output, is 4.49 with 
p-value =.034. So this statistic rejects the logit model, although it is not an overwhelming 
rejection. Its p—value is certainly much smaller than the Wald test (¢ test) for Ho : log(a) = 0. 

e. Given the bizarre value for @ and the modest gain in fit, the skewed logit model does not 
seem worth the effort. Plus, the Stata output below shows that the correlations of the fitted 
probabilities and in/f are very similar across the two models (.5196 for skewed logit, .5179 for 
logit). The average partial effects are similar, too. For nwifeinc, the APE for skewed logit is 
about —.0041 for skewed logit and about —. 0038 for logit. For kids/t6, the APEs are —.274 
(skewed logit) and —. 258 (logit). It is likely these differences can be attributed to sampling 


error. 


. qui scobit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 


. predict phat_skewlog 
(option pr assumed; Pr(inlf)) 


. predict xbh_sklog, xb 
. gen scale_sklog = e(alpha)*exp(xbh_sklog)/((1 + exp(xbh_sklog) )4(1 + e(alpha 
. sum scale_sklog 
Variable | Obs Mean Std. Dev. Min Max 
ae A Cmte et SOS acl eh 2 ond Men Bea Nie ctr ie ade wale EACAS an Ete in Dade A 


scale_sklog | 753 .2741413 . 0891063 . 0098302 . 3678786 


. predict phat_skewlog 
(option pr assumed; Pr(inlf)) 


. qui logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 


. predict phat_log 
(option pr assumed; Pr(inlf)) 


. predict xbh_log, xb 
. gen scale_log = exp(xbh_log)/((1 + exp(xbh_log))2 ) 


. sum scale_log 
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Variable | Obs Mean Std. Dev. Min Max 


a One E +-------------------------------------------------------- 
scale_log | 753 .1785796 .0617942 -0085973 .25 
. corr phat_skewlog inlf 
(obs=753) 
| phat_s~g inlf 
tesa ee a i te <n +------------------ 
phat_skewlog | 1.0000 
inlf | 0.5196 1.0000 


. corr phat_log inlf 


(obs=753) 
| phat_log inlf 
Eel adh ee ee ee pe ee et Sey! ae +------------------ 
phat_log | 1.0000 
inlf | 0.5179 1.0000 


. di .2741413* (-.0148532) 
- .00407188 


. di .1785796*(-.0213452) 
- .00381182 


. di .2741413* (-.9977924) 
- .27353611 


. di .1785796*(-1.443354) 
- .257 75358 


15.17. a. We obtain the joint density by the product rule, since we have independence 


conditional on (x, c): 


JOrn- yaly) = f0 cY Y20133) Sewel, c yS). 
b. The density of (yı, ..., ya) given x is obtained by integrating out with respect to the 


distribution of c given x: 
oe) 
—00 


G 
gi -Yal Y,) = J (Losy )arid.y, 


where c is a dummy argument of integration. Because c appears in each D(ye|x, c), Y1, ..., YG 
are dependent without conditioning on c. 


c. The log-likelihood for each 7 is 
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0 G 
loef f B (ú felvielXis 1) ie ba | 


As expected, this depends only on the observed data, (x;,vi1,..., Vig), and the unknown 
parameters. 


15.18. a. The probability is the same as if we assume (15.73), that is, 
Pyu = 1|x;,a;) = O(y + Xub + Kc + ai), t= 1,2, Siin T. 


The fact that a; given x; is heteroskedastic has no bearing on the distribution conditional on 


(x;,a;). Only when we “integrate out” a; does D(a;|x;) matter. 


b. Let glx; a0) = [P(Y + xB + ZÉ + ai) [1 — O(y + xi PB + XE + a;)]'~’. Then, by 


the product and integration rules, 
o fT 
flr, ....97158) = | (Lle-orix.ai0) Jacas8)as | 
—© \ t1 


where /(-|x;,5) is the Normal[0, 02 exp(X;A) | density. We get the log-likelihood by plugging in 
the y; and taking the natural log. For each i, the log likelihood depends on (x;,y,) and the 
parameters 8 and 6; a; does not appear. 
c. To estimate the APEs we can estimate the average structural function, which in this case 
is 
ASF(x°?) = E.,[®(y + x°B + c;)] 


= Ecx,a)[O(y + x°B + X§ + a:;)] 
= Ex, {E[O(y + x°B + Zé + a;)|xi]} 


To compute E[®(y + x°B + X; + a;)|x;] we use a similar trick as before. It is the same as 


computing 


where 
a; + Un|x; ~ Normal(0,1 + 02 exp(X;A)). 


because u; is independent of (a;,x;) with a standard normal distribution. Now 


Ely + x°B+X§+a;+ uir > O]|x;) = Pla; + un > -y + x°B + X€)|x,] 
_ Phe hte ZV + x°B + iG) 
[1 + o2 exp(x;A)] 1 [1 + o2 exp(x;A)] 1 
_ of (w +x°B + xi§) | 


1+ oł exp(x;A)]'? 


J 


(Notice this only depends on X;, not on x;. We could relax that assumption.) 


The ASF is therefore, 


Roe (y + x°B + X§) 
ASF(x°) = Ex, {o| [1 + of exp(x:A)] 1 |} 


and a consistent estimator is obtained by using a sample average and plugging in the maximum 
likelihood estimators: 


N ee 3 
ASF(x”) -mio eb | 


= 1 +62 exp(x;A)]!? 


Now take derivatives and changes with respect to x° (a placeholder). 
15.19. a. The Stata output is below. We need to assume first-order dynamics for the usual 


standard errors and test statistics to be valid. 


. tab year 
81 to 87 | Freq. Percent Cum 

Se a a a a pe oe ay +----------------------------------- 
81 | 1,738 14.29 14.29 
82 | 1,738 14.29 28.57 
83 | 1,738 14.29 42.86 
84 | 1,738 14.29 57.14 
85 | 1,738 14.29 71.43 
86 | 1,738 14.29 85.71 
87 | 1,738 14.29 100.00 

epee ab ee ee ee ee +----------------------------------- 


Total | 12,166 100.00 


tab black if year = 87 


=1 if black | Freq. Percent Cum 
O | 1,065 61.28 61.28 
1 | 673 38.72 100.00 
Total | 1,738 100.00 


. xtset id year 
panel variable: 
time variable: 
delta: 


id (strongly balanced) 
year, 81 to 87 
1 unit 


gen employ_1 = 1.employ 
(1738 missing values generated) 


probit employ employ_1 if black 


Probit regression Number of obs = 4038 
LR chi2(1) = 1091.27 
Prob > chi2 = 0.0000 
Log likelihood = -2248.0349 Pseudo R2 = 0.1953 
employ | Coef Std. Err. Z P>|z | [95% Conf. Interval 
St Gon St + ak vine ph ad a Se +--------------------------------------------------------------- 
employ_1 | 1.389433 .0437182 31.78 0.000 1.303747 1.475119 
_cons | -.5396127 .0281709 -19.15 0.000 - .5948268 - .4843987 


b. After estimating the previous model, the Stata calculations are below. The difference in 


employment probabilities this year, based on employment status last year, is about .508. 


. di normal(_b[_cons]) 
- 29473206 


. di normal(_b[_cons] 
. 80228758 


. di normal(_b[_cons] 
.50755552 


+ _b[employ_1]) 


+ _b[employ_1]) - normal(_b[_cons]) 


c. With year dummies, the story is very similar. The estimated state dependence for 1987 is 


about . 472. 


probit employ employ_1 y83-y87 if black 


Probit regression 


Number of obs 
LR chi2(6) 
Prob > chi2 


Log likelihood = -2215.1795 Pseudo R2 


333 


4038 
1156.98 
0.0000 
0.2071 


1 


. 321349 


. 3427664 
. 4586078 
.5200576 
. 3936516 
.5292136 


8850412 


. 0453568 2 
. 0749844 
.0755742 
.0767271 
.0774704 
.0773031 
.0556042 -1 


1.232452 
. 1957997 
. 3104851 
. 3696753 
.2418125 
.3777023 
- .9940233 


1.410247 
. 4897331 
.6067305 
.6704399 
. 5454907 

.680725 
- . 776059 


. di normal(_b[_co 
- 4718734 


ns] + _b[y87] + _b[employ_1]) - normal(_b[_cons] + _b[y87] 


d. Below gives one way in Stata to estimate the dynamic unobserved effects model. 


Compared with not allowing for heterogeneity as in part c, the coefficient on employ- has 


fallen: from about 1.321 to about .899. In addition, the coefficient on the initial condition is 


.966 and it is very statistically significant. But we cannot know how much the amount of state 


dependence has changed without computing an average partial effect. 


gen employ81 = employ if y81 
(10428 missing values generated) 


. replace employ81 
(1738 real changes 


. replace employ81 
(1738 real changes 


replace employ81 
(1738 real changes 


. replace employ81 
(1738 real changes 


. replace employ81 
(1738 real changes 


replace employ81 
(1738 real changes 


= employ[_n-1] if y82 


made) 


= employ[_n-2] if y83 


made) 


= employ[_n-3] if y84 


made) 


= employ[_n-4] if y85 


made) 


= employ[_n-5] if y86 


made) 


= employ[_n-6] if y87 


made) 


. Xtprobit employ employ_1 employ81 y83-y87 if black, 


Random-effects probit regression 


Group variable: id 


Random effects u_i ~Gaussian 
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Number of obs 
Number of groups 


Obs per group: min 


avg 
max 


Wald chi2(7) = 677.59 
Log likelihood = -2176.3738 Prob > chi2 = 0.0000 
employ | Coef Std. Err Z P>|z | [95% Conf. Interval 
a ee ee ee ere ee +--------------------------------------------------------------- 
employ_1 | . 8987806 .0677058 13.27 0.000 . 7660797 1.031482 
employ81 | . 5662897 .0884941 6.40 0.000 . 3928444 . 739735 
y83 | . 4339911 .0804064 5.40 0.000 .2163974 . 5915847 
y84 | . 6563094 .0841199 7.80 0.000 . 4914374 .8211814 
y85 | . 7919805 0887167 8.93 0.000 .618099 . 965862 
y86 | . 6896344 090158 7.65 0.000 .5129279 . 8663409 
y87 | . 8382018 091054 9.21 0.000 .6597393 1.016664 
cons | -1.005103 0660945 -15.21 0.000 -1.134646 - .8755602 
/lnsig2u | -1.178731 1995372 -1.569817 - . 7876454 
sigma_u | -5546791 0553396 -4561615 -6744736 
rho | . 2352804 .0359014 »1722425 . 3126745 


Likelihood-ratio test of rho=0: chibar2(01) = 


47.90 Prob >= chibar2 = 0.000 


e. There is still plenty of evidence of state dependence because of the very statistically 
significant coefficient on employ-_; (t = 13.27). The coefficient still seems quite large, but we 
still need to compute the APE. 

The positive coefficient on employs; shows that that c; and employ;g; are positively 
correlated. The estimate of o2 is (.5546791)7, or G2 ~.308. 

f. The average state dependence, where we average out the distribution of c;, is estimated 


as 


N R A x A Pi A A 
N- y p| t87 + p+ yo) p| t887 + Svo) 
c (ELON (a +63)" 


where p is the coefficient on y_; = employ-1, Yio = employ;1981, and, in this case, the 
averaging is done across the black men in the sample. The Stata calculations below (done after 
the calculations in part d) show the estimated state dependence is about .283, which is much 
lower than the estimate of . 472 from part c (where we ignored heterogeneity). Bootstrapping is 


a convenient way to obtain a standard error, as was done in Example 15.6. 


gen stdep = normal((_b[_cons] + _b[employ_1] + _b[employ81]*employ81 
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+ _b[y87])/sqrt(1 + e(sigma_u)‘2)) 
- normal((_b[_cons] + _b[employ81]*employ81 + _b[y87]) 
/sqrt(1 + e(sigma_u)42)) if black & y87 
(11493 missing values generated) 


sum stdep 
Variable | Obs Mean Std. Dev. Min Max 
ja Sa a ace i i a +-------------------------------------------------------- 
stdep | 673 . 283111 .0257298 . 2353074 . 2969392 
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15.20. (Bonus Question) Estimate the CRE probit model report in Table 15.3 using the 
generalized estimation equation (GEE) approach described in Section 12.9.2, using an 
exchangeable correlation structure. 

a. How do the point estimates compare with the pooled probit estimates in Column (3) of 
Table 15.3? 

b. Does it appear that the GEE approach improves on the efficiency of pooled probit? 
Explain. 

Solution: 

a. The Stata output for pooled probit and GEE is given below. The pooled probit estimates 


replicate the numbers in Table 15.3. 


probit lfp kids lhinc kidsbar lhincbar educ black age agesq per2-per5, 


cluster(id) 
Iteration 0: log pseudolikelihood = -17709.021 
Iteration 1: log pseudolikelihood = -16521.245 
Iteration 2: log pseudolikelihood = -16516.437 
Iteration 3: log pseudolikelihood = -16516.436 
Probit regression Number of obs = 28315 
Wald chi2(12) = 538.09 
Prob > chi2 = 0.0000 
Log pseudolikelihood = -16516.436 Pseudo R2 = 0.0673 
(Std. Err. adjusted for 5663 clusters in id 
Robust 
lfp | Coef Std. Err. Z P>|z | [95% Conf. Interval 
Sas ee st ce ge +--------------------------------------------------------------- 
kids | -.1173749 .0269743 -4.35 0.000 - .1702435 - .0645064 
lhinc | -.0288098 .014344 -2.01 0.045 - .0569234 - .0006961 
kidsbar | -.0856913 .0311857 -2.75 0.006 - .146814 - .0245685 
lhincbar | -.2501781 .0352907 -7.09 0.000 - .3193466 - .1810097 
educ | - 0841338 . 0067302 12.50 0.000 .0709428 .0973248 
black | . 2030668 . 0663945 3.06 0.002 .0729359 . 3331976 
age | . 1516424 .0124831 12.15 0.000 .127176 . 1761089 
agesq | -.0020672 . 0001553 -13.31 0.000 - .0023717 - .0017628 
per2 | -.0135701 .0103752 -1.31 0.191 - .0339051 . 0067648 
per3 | -.0331991 .0127197 -2.61 0.009 - .0581293 - .008269 
per4 | -.0390317 .0136244 -2.86 0.004 - .0657351 - .0123284 
per5 | -.0552425 .0146067 -3.78 0.000 - .0838711 - .0266139 
cons | -.7260562 . 2836985 -2.56 0.010 -1.282095 -.1700173 
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xtgee lfp kids lhinc kidsbar lhincbar educ black age agesq per2-per5, 
fam(binomial) link(probit) corr(exch) robust 


GEE population-averaged model Number of obs = 28315 
Group variable: id Number of groups = 5663 
Link: probit Obs per group: min = 
Family: binomial avg = 5. 
Correlation: exchangeable max = 

Wald chi2(12) = 536.66 
Scale parameter: 1 Prob > chi2 = 0.0000 


(Std. Err. adjusted for clustering on id 


| Semirobust 

lfp | Coef. Std. Err. Z P>|z | [95% Conf. Interval 

a ee ee ge ee ee +--------------------------------------------------------------- 
kids | -.1125361 .0281366 -4.00 0.000 -.1676828 - .0573894 
lhinc | -.0276543 .014799 -1.87 0.062 - .0566598 .0013511 
kidsbar | -.0892543 . 0323884 -2.76 0.006 - .1527344 -.0257742 
lhincbar | -.252001 .0360377 -6.99 0.000 - . 3226337 - .1813684 
educ | . 0841304 . 0066834 12.59 0.000 .0710312 .0972296 
black | .205611 .0668779 3.07 0.002 .0745328 . 3366893 
age | .152809 .0125434 12.18 0.000 .1282245 .17 73936 
agesq | -.0020781 .0001565 -13.28 0.000 - .0023847 -.0017714 
per2 | -.0134259 .0103607 -1.30 0.195 - .0337324 . 0068807 
per3 | -.0329993 0126967 -2.60 0.009 - .0578845 - .0081141 
per4 | -.0384026 0136212 -2.82 0.005 - .0650997 - .0117056 
per5 | - .05451 0146135 -3.73 0.000 - .083152 - .025868 
cons | -.7532503 285216 -2.64 0.008 -1.312263 - .1942373 


b. Surprisingly, and disappointingly, the GEE approach does not improve the precision of 


the estimators. In fact, the robust standard errors for GEE are actually slightly above those for 


pooled probit. This finding is particular puzzling because there is substantial serial correlation 


in the standardized residuals, written generally after pooled probit estimation as 


i [vit — Pah + + Wid) : 
{O(xiP+ 7 + WELL - OCP +H + WO] 12’ 


rit = 


where W; is the time average of variables that change across i and ¢ (kids and /hinc i; in this 


application). The first-order correlation in the <°» : t = 2,...,7;i = 1,...,N} is about . 83. 


qui probit 1lfp kids lhinc kidsbar lhincbar educ black age agesq per2-per5 


predict phat 


(option pr assumed; Pr(lfp)) 
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gen rh = (lfp - phat)/sqrt(phat*(1 - phat)) 


gen rh_i = 1.rh 
(5663 missing values generated) 


corr rh rh_1 


(obs=22652) 


1.0000 


0.8315 1.0000 


c. This is not an answer to a particular question, but serves as an errata for the estimates on 


Column (4) of Table 15.3. Those estimates were obtained using a version of Stata earlier than 


9.0. Using Stata 11.0, a higher value of the log likelihood is found, and the point estimates are 


different. Note that the estimated value of p, which is the pairwise correlation between any of 


the two composite errors a; + ei, is very large: .95. The estimated scale factor for the 


coefficients, about .233, is substantially below that in Table 15.3, but the coefficients reported 


below are substantially higher. I have deleted the details of the numerical iterations. 


xtprobit lfp kids lhinc kidsbar lhincbar educ black age agesq per2-per5, re 


Random-effects probit regression 


Group variable: 


Random effects 


Log likelihood 


id 


u_i ~Gaussian 


= -8609.9002 


kidsbar 
lhincbar 
educ 
black 
age 
agesq 
per2 
per3 
per4 
per5 
_cons 


- .3970102 
- .1003399 
- .4085664 
- .8941069 
. 3189079 
. 6388784 
. 7282057 
- .0098358 
- .0451653 
- .1247056 
- .1356834 
- .200357 
-5.359375 


.0701298 
. 0469979 
.0898875 
. 1199703 

.024327 
1903525 
. 0445623 
.0005747 
.0499429 
.0501522 
.0500679 

. 049539 
1.000514 


Number of obs = 28315 
Number of groups Š 5663 
Obs per group: min = 5 

avg = 5. 

max = 

Wald chi2(12) = 623.40 
Prob > chi2 = 0.0000 
P>|z| [95% Conf. Interval 
0.000 - . 534462 - . 2595584 
0.033 - .1924541 - .0082258 
0.000 - .5847428 - .2323901 
0.000 -1.129244 - .6589695 
0.000 .2112279 . 366588 
0.001 . 2657945 1.011962 
0.000 .6408651 .8155462 
0.000 - .0109623 - .0087094 
0.366 - .1430516 .052721 
0.013 - .2230022 - .026409 
0.007 - . 2338147 - .0375522 
0.000 -.2974515 - .1032624 
0.000 -7.320346 -3.398404 


sigma_u | 4.364995 .0951224 4.182484 4.55547 
rho | . 9501326 .002065 . 945926 . 9540279 


Likelihood-ratio test of rho=0: chibar2(01) = 1.6e+04 Prob >= chibar2 = 0.000 


* Scale factor for coefficients: 


di 1/sqrt(1 + e(sigma_u)42) 
. 22331011 
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Solutions to Chapter 16 Problems 

16.1. a. The Stata ouput below contains the estimates for 1981 and, for completeness, 1987. 
Certainly some magnitudes are fairly different. For example, education has a much larger 
effect in the latter time period. Also, the effect of experience on the log-odds ratios are quite 


different. 


mlogit status educ exper expersq black if y81, base(0) 


Multinomial logistic regression Number of obs = 1737 
LR chi2(8) = 720.39 
Prob > chi2 = 0.0000 
Log likelihood = -1502.9396 Pseudo R2 = 0.1933 
status | Coef Std. Err Zz P>|z | [95% Conf. Interval 
0 | (base outcome) 
1 | 
educ | - .47558 . 0466559 -10.19 0.000 - .5670238 - .3841361 
exper | 3.016025 . 4513224 6.68 0.000 2.131449 3.900601 
expersq | -.5953032 . 2690175 -2.21 0.027 -1.122568 - .0680386 
black | . 8649358 . 1302512 6.64 0.000 . 6096481 1.120224 
cons | 4.138761 .5276112 7.84 0.000 3.104662 5.17286 
eon Se Ce ee ee ees See at +--------------------------------------------------------------- 
2 | 
educ | -.1019564 .0495931 -2.06 0.040 -.1991571 - .0047558 
exper | 4.101794 . 4359451 9.41 0.000 3.247357 4.956231 
expersq | -.7069626 . 2628842 -2.69 0.007 -1.222206 -.1917191 
black | .0208189 . 1436123 0.14 0.885 -.2606561 . 3022938 
cons | -.0313456 . 5828582 -0.05 0.957 -1.173727 1.111035 


mlogit status educ exper expersq black if y87, base(0) 


Multinomial logistic regression Number of obs = 1717 
LR chi2(8) = 583.72 
Prob > chi2 = 0.0000 
Log likelihood = -907.85723 Pseudo R2 = 0.2433 
status | Coef Std. Err Z P>|z | [95% Conf. Interval 

0 | (base outcome) 

1 | 

educ | -.6736313 .0698999 -9.64 0.000 - .8106325 - .53663 
exper | -.1062149 173282 -0.61 0.540 - . 4458414 . 2334116 
expersq | -.0125152 .0252291 -0.50 0.620 - .0619633 . 036933 
black | .8130166 . 3027231 2.69 0.007 . 2196902 1.406343 
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_cons | 10.27787 1.133336 9.07 0.000 8.056578 12.49917 


See St E ta per Fi ee Fog a a +--------------------------------------------------------------- 
2 | 
educ | -.3146573 .0651096 -4.83 0.000 - .4422699 - .18 70448 
exper | . 8487367 . 1569856 5.41 0.000 .5410507 1.156423 
expersq | -.0773003 .0229217 -3.37 0.001 -.1222261 - .0323746 
black | . 3113612 . 2815339 1.11 0.269 - . 240435 . 8631574 
cons | 5.543798 1.086409 5.10 0.000 3.414475 7.673121 


b. Just adding year dummies is probably not sufficient, given the findings in part a, but the 
results are below. Because the model is static and we have panel data, we should use inference 
robust to arbitrary serial dependence. In this application, the robust standard errors are 


typically larger but the difference is not huge. 


mlogit status educ exper expersq black y82-y87, base(0) 


Multinomial logistic regression Number of obs S 12108 
LR chi2(20) = 6409.72 
Prob > chi2 = 0.0000 
Log likelihood = -8842.6383 Pseudo R2 = 0.2660 
status | Coef Std. Err Zz P>|z | [95% Conf. Interval 
0 | (base outcome) 
1 | 
educ | -.5473739 .0189537 -28.88 0.000 - . 5845225 - .5102253 
exper | . 769957 . 0633149 12.16 0.000 -6458621 . 8940519 
expersq | -.1153749 .0107134 -10.77 0.000 - .1363729 - .094377 
black | .87 73806 .0656223 13.37 0.000 . 7487633 1.005998 
y82 | .9871298 .0928663 10.63 0.000 . 8051152 1.169144 
y83 | 1.383591 . 1035337 13.36 0.000 1.180669 1.586514 
y84 | 1.587213 .115548 13.74 0.000 1.360743 1.813683 
y85 | 2.052594 .1307157 15.70 0.000 1.796396 2.308792 
y86 | 2.652847 . 1513588 17.53 0.000 2.356189 2.949505 
y87 | 2.727265 . 1701085 16.03 0.000 2.393858 3.060671 
cons | 5.151552 » 2282352 22.57 0.000 4.704219 5.598885 
Sa Se a ee! pet. emg ep a +--------------------------------------------------------------- 
2 | 
educ | -.2555556 .0182414 -14.01 0.000 - .291308 - .2198032 
exper | 1.823821 .058522 31.16 0.000 1.70912 1.938522 
expersq | -.195654 .0095781 -20.43 0.000 - .2144267 -.1768813 
black | . 33846 . 0649312 5.21 0.000 .2111972 -4657227 
y82 | - 5624964 . 0936881 6.00 0.000 . 3788712 . 7461217 
y83 | 1.225732 .0998516 12.28 0.000 1.030027 1.421438 
y84 | 1.42652 . 1095939 13.02 0.000 1.21172 1.64132 
y85 | 1.662994 1243071 13.38 0.000 1.419357 1.906632 
y86 | 2.029585 1447257 14.02 0.000 1.745928 2.313242 
y87 | 1.995639 1622294 12.30 0.000 1.677675 2.313603 
cons | 1.858323 .225749 8.23 0.000 1.415863 2.300783 
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mlogit status educ exper expersq black y82-y87, base(0) cluster(id) 


Multinomial logistic regression 


Log pseudolikelihood = -8842.6383 


Number of obs 
Wald chi2(20) 
Prob > chi2 
Pseudo R2 


12108 
2742.09 
= 0.0000 
0.2660 


(Std. Err. adjusted for 1738 clusters in id 


educ 
exper 
expersq 
black 


educ 
exper 
expersq 
black 
y82 

y83 


- .5473739 
. 769957 

- .1153749 
.87 73806 
-9871298 
1.383591 
1.587213 
2.052594 
2.652847 
2.727265 
5.151552 


Robust 


.0200999 
.0776371 
.0106075 
. 0855443 
.0760747 
. 0888752 
. 1050477 
. 1275644 
.1526831 
. 1666166 
. 2523957 


- .586769 

.617791 
- .1361653 
. 7097169 
8380261 
1.209399 
1.381323 
1.802572 
2.353593 
2.400702 
4.656866 


- .5079789 
922123 

- .0945846 
1.045044 
1.136234 
1.557784 
1.793103 
2.302615 
2.9521 
.053827 
- 646238 


- .2555556 
1.823821 
- .195654 

. 33846 

. 5624964 
1.225732 
1.42652 

1.662994 
2.029585 
1.995639 
1.858323 


.0177679 
0731396 
010131 
.0783575 
.0796845 
. 0897086 
1027116 
. 124454 
.1526669 
. 1636634 
.2257666 


- .29038 
1.68047 

- .2155104 
. 1848821 
. 4063177 
.049907 
225209 
.419069 
. 730363 
.674865 
. 415829 


- .2207312 
1.967172 
- .1757976 
. 4920378 
. 7186751 
1.401558 
1.627831 

1.90692 
2.328807 
2.316414 
2.300817 


c. The time dummies have very large ¢ statistics, and the robust joint test gives a 77 value 


of 624.28, which implies a zero p-value to many decimal places. 


d. After obtaining the estimates from part c, the following commands produce the change 


in the estimated employment probabilities. It is about .021 for 1981, and about .058 for 1987. 


di exp([2]_cons + [2]educ*16 + [2]exper*5 + [2]expersq*25 + [2]black) 
/( 


1 + exp([1]_cons ; 
+ exp([2]_cons 4 


. 89820453 


+ [1]educ*16 + 
+ [2]educ*16 +4 


+ [1]exper*5 4 


+ [2]exper*5 4 
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+ [L]expersq*25 4 
+ [2]expersq*25 4 


- [1]black) 
- [2]black)) 


. di exp([2]_cons + [2]educ*12 + [2]exper*5 + [2]expersq*25 + [2]black) 
/(1 + exp([1]_cons + [1]educ*12 + [1]Jexper*5 + [1]expersq*25 + [1]black) 
+ exp([2]_cons + [2]educ*12 + [2]exper*5 + [2]expersq*25 + [2]black) ) 
. 91903414 


. di .91903414 - .89820453 
02082961 


. di exp([2]_cons + [2]educ*12 + [2]exper*5 + [2]expersq*25 + [2]black + [2]y87 
/(1 + exp([1]_cons + [1]educ*12 + [1]exper*5 + [1]expersq*25 + [1]black 
+ [1]y87) 
+ exp([2]_cons + [2]educ*12 + [2]exper*5 + [2]expersq*25 + [2]black 
+ [2]y8)) 
.89646574 


. di exp([2]_cons + [2]educ*16 + [2]exper*5 + [2]expersq*25 + [2]black + [2]y87 
/(1 + exp([1]_cons + [1]educ*16 + [1]Jexper*5 + [1]expersq*25 + [1]black 

+ [1]y87) 

+ exp([2]_cons + [2]educ*16 + [2]exper*5 + [2]expersq*25 + [2]black 

+ [2]y87)) 

. 95454392 


. di .95454392 - .89646574 
.05807818 


16.2. a. The following Stata output contains the linear regression results. Because pctstck is 
discrete (taking on the values 0, 50, and 100), it seems likely that heteroskedasticity is present 


in a linear model. In fact, the robust standard errors are not very different from the usual ones. 
use pension 
tab pctstck 


O=mstbnds,5 
O=mixed, 100 


| 
| 
=mststcks | Freq Percent Cum 
het een. seat ina =a Cee ope, le sence. Soy ae +----------------------------------- 
O | 78 34.51 34.51 
50 | 85 37.61 72.12 
100 | 63 27.88 100.00 
ET Se a“ few rn. Sea i ek Oe +----------------------------------- 
Total | 226 100.00 


reg pctstck choice age educ female black married finc25-finc101 wealth89 
prftshr, robust 


Linear regression Number of obs = 194 
F( 14, 179) = 2.15 
Prob > F = 0.0113 
R-squared = 0.0998 
Root MSE = 39.134 
Robust 
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pctstck 


choice 
age 
educ 
female 
black 
married 
finc25 
finc35 
finc50 
finc75 
finc100 
finc101 
wealth89 
prftshr 
_cons 


12.04773 
-1.625967 
. 7538685 
1.302856 
3.967391 
3.303436 
-18.18567 
-3.925374 
-8.128784 
-17.57921 
-6.74559 
-28.34407 
- .0026918 
15.80791 
134.1161 


5.994437 
.8327895 
1.172328 
7.148595 
8.974971 
8.369616 
16.00485 
15.86275 

15.3762 

16.6797 

16.7482 
16.57814 
0114136 
8.107663 
58.87288 


.2188713 
-3.269315 
-1.559493 
. 80351 
. 74297 
.21237 
. 76813 
.22142 
.47072 
. 49335 

-39.7949 
-61.05781 
- .0252142 
- . 1909844 

17.9419 


23.87658 
.0173813 

3.06723 
15.40922 
21.67775 
19.81924 
13.39679 
27.37668 
22.21315 
15.33493 
26.30372 


b. With relatively few husband-wife pairs — 23 in this application — we do not expect big 
differences in standard errors, and we do not see them. On the key variable, choice, the 
cluster-robust standard error is only slightly larger. (Incidentally, this part really should not 


come until Chapter 20.) 


reg pctstck choice age educ female black married finc25-finc101 wealth89 
prftshr, cluster(id) 


Linear regression Number of obs = 194 
F( 14, 170) = 2.12 
Prob > F = 0.0128 
R-squared = 0.0998 
Root MSE 39.134 


(Std. Err. adjusted for 171 clusters in id 
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| Robust 

pctstck | Coef Std. Err. t P>|t | [95% Conf. Interval 

ps isa a i +--------------------------------------------------------------- 
choice | 12.04773 6.184085 1.95 0.053 -.1597617 24.25521 
age | -1.625967 .8192942 -1.98 0.049 -3.243267 - .0086663 
educ | . 7538685 1.1803 0.64 0.524 -1.576064 3.083801 
female | 1.302856 7.000538 0.19 0.853 -12.51632 15.12203 
black | 3.967391 8.711611 0.46 0.649 -13.22948 21.16426 
married | 3.303436 8.624168 0.38 0.702 -13.72082 20.32769 
finc25 | -18.18567 16.82939 -1.08 0.281 -51.40716 15.03583 
finc35 | -3.925374 16.17574 -0.24 0.809 -35.85656 28.00581 
finc50 | -8.128784 15.91447 -0.51 0.610 -39.54421 23.28665 
finc75 | -17.57921 17.2789 -1.02 0.310 -51.68804 16.52963 
finc100 | -6.74559 17.24617 -0.39 0.696 -40.78983 27.29865 
finc101 | -28.34407 17.10783 -1.66 0.099 -62.1152 5.42707 
wealth89 | -.0026918 .0119309 -0.23 0.822 - .0262435 .02086 
prftshr | 15.80791 8.356266 1.89 0.060 - .6874979 32.30332 
_cons | 134.1161 58.1316 2.31 0.022 19.36333 248 . 8688 


di _b[_cons] + _b[age]*60 ; 


38.374791 


di _b[_cons] + _b[age]*60 ; 


50.422517 


+ _b[educ]*12 4 


+ _b[educ]*12 4 


+ _b[female] ; 


+ _b[ female] + 


+ _b[finc50] 4 


+ _b[finc50] +4 


+ _b[wealth89 


+ _b[ wealth89 


For later use, the predicted pctstck for the person described in the problem, with choice = 0 


is about 38.37. With choice, it is roughly 50.42. 


c. The ordered probit estimates follow, including commands that provide the predictions 


for pctstck with and without choice: 


pctstck choice age educ female black married finc25-finc101 wealth89 


oprobit 
prftshr 
Iteration 0: log likelihood = -212.37031 
Iteration 1: log likelihood = -202.0094 
Iteration 2: log likelihood = -201.9865 
Iteration 3: log likelihood = -201.9865 
Ordered probit regression 
Log likelihood = -201.9865 
pctstck | Coef Std. Err 
Ga ee ae en See en i + 
choice | .371171 . 1841121 2. 
age | -.0500516 .0226063 -2. 
educ | .0261382 .0352561 0. 
female | .0455642 . 206004 0. 
black | . 0933923 . 2820403 0. 
married | .0935981 . 2332114 0. 
finc25 | -.5784299 .423162 -1. 
finc35 | -.1346721 . 4305242 -0. 
finc50 | -.2620401 .4265936 -0. 
finc75 | -.5662312 .4780035 -1 
finc100 | -.2278963 4685942 -0. 
finc101 | -.8641109 5291111 -1. 
wealth89 | -.0000956 0003737 -0. 
prftshr | 4817182 2161233 2. 
ae a e ee me eae Bere + 
/cut1 | -3.087373 1.623765 
/cut2 | -2.053553 1.618611 


. 7320241 
- .005744 
.0952389 
. 4493246 
.6461811 

. 550684 
. 2509524 
. 7091399 
.5740681 
. 3706385 


Number of obs = 
LR chi2(14) = 
Prob > chi2 = 
Pseudo R2 = 
Zz P>|z| 
02 0.044 , 010318 
21 0.027 - .0943591 
74 0.458 - .0429626 
22 0.825 - . 3581963 
33 0.741 - .4593965 
40 0.688 - .3634878 
37 0.172 -1.407812 
31 0.754 -.9784841 
61 0.539 -1.098148 
.18 0.236 -1.503101 
49 0.627 -1.146324 
63 0.102 -1.90115 
26 0.798 - .0008279 
23 0.026 .0581243 
-6.269894 
-5.225972 


.0951479 
1.118865 


di b[age]*60 + _b[educ]*12 + _b[female] + _b[finc50] + _b[wealth89]*150 


-2.9202491 


di normal(_b[/cut2] + 2.9202491) - normal(_b[/cut1] +2.9202491) 


. 37330935 
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. di 1 - normal(_b[/cut2] + 2.9202491) 
. 19305438 


. di 50*.37330935 + 100*.19305438 
37.970906 


di _b[age]*60 + _b[educ]*12 + _b[female] + _b[finc50] + _b[wealth89]*150 
+ _b[choice] 
-2.5490781 


. di normal(_b[/cut2] + 2.5490781) - normal(_b[/cut1] + 2.5490781) 
. 39469838 


. di 1 - normal(_b[/cut2] + 2.5490781) 
. 31011489 


. di 50*.39469838 + 100*.31011489 
50.746408 


di 50.75 - 37.97 
12.78 


Using ordered probit, the effect of having choice for this person is about 12.8 percentage 
points more invested in the stock market, which is pretty similar to the 12.1 points obtained 
with the linear model. 

d. We can compute an R-squared for the ordered probit model by using the squared 
correlation between the predicted pctstck; and the actual. The following Stata session does this, 
after using the oprobit command. The squared correlation for ordered probit is about . 097, 
which is actually slightly below the linear model R-squared, .098. The correlation between the 


fitted values for the linear and OP models is very high: .998. 
qui oprobit pctstck choice age educ female black married finc25-finc101 wealth89 
predict płhat p2hat p3hat 

(option pr assumed; predicted probabilities) 

(32 missing values generated) 


sum pihat p2hat p3hat 


Variable | Obs Mean Std. Dev Min Max 

n) aa a cp. a +-------------------------------------------------------- 
pihat | 194 . 331408 .1327901 . 0685269 . 8053644 

p2hat | 194 . 3701685 .0321855 . 1655734 . 3947809 

p3hat | 194 . 2984236 . 1245914 -0290621 -6747374 


gen pctstck_op = 50*p2hat + 100*p3hat 
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(32 missing values generated) 


. corr pctstck pctstck_op 


(obs=194) 
| pctstck pctstc~p 
Ge ee nl a i a i te +------------------ 
pctstck | 1.0000 
pctstck_op | 0.3119 1.0000 
. di .312A2 
.097344 


. qui reg pctstck choice age educ female black married finc25-finc101 wealth89 
. predict pctstck_lin 

(option xb assumed; fitted values) 

(32 missing values generated) 


. corr pcetstck_lin pctstck_op 


(obs=194) 
| petstc~n pctstc~p 
ee a i a ee ee ee ie +------------------ 
pctstck_lin | 1.0000 
pctstck_op | 0.9980 1.0000 


16.3. a. We can derive the response probabilities from the latent variable formulation in 


(16.21) and the rule in (16.22). 


exp(—x15)xB + exp(—x18)e 


exp(—x18)y* 


exp(—x18)xB + a 


where 
a\x ~ Normal(0, 1). 
Now a; < y* < a;n if and only if exp(—x18)a; < exp(—x18)y* < exp(—x16)aj+1, and so 


PO = jix) = Plexp(—x18)a; < exp(—x15)xB + a < exp(—x18)aj1|x] 
= Plexp(—x18)a; — exp(—x15)xB < a < exp(—x18)aj41 — exp(—x18)xB|x] 
= O[exp(—x18)(aj1 — xB) ] - ®[exp(—x18) (a; — xB) ]. 


A similar argument holds at 7 = 0 and 7 = J. Therefore, as described in the text, the response 


probabilities for the heteroskedastic ordered probit are of the same form as usual ordered probit 
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but with a; — xB everywhere replaced with exp(—x18)(a; — xB). 

b. We can obtain a useful VAT by applying the score statistic — just as in the binary probit 
case. The score of the log likelihood with respect to (a’, B')’, evaluated at = 0, is easily seen 
to just be the usual score for ordered probit. For 0 < j < J, the score of the response 


probability with respect to 6 evaluated at 6 = 0 is 


j xalla — Xi) b(aj1 — xiB) — (a; — xiB) O(a; — xp) l 


a Plaj — xiB) — O(a; — xp) 


Forj = 0 andj = J we have 


=ifye~ 0) x4 [a — xB) b(a1 — xp] 


O(a — xp) 
7 Xala- xpos - xiB)] 
Ibi = J] 1- O(a; — xp) 


It is easily seen that these are identical to the scores that would be obtained by adding 

— xii (a; — x.B) 
as a set of explanatory variables to the usual OP model and testing their joint significance. In 
practice, we would replace the a; and B with the MLEs from the original OP problem. 


d. The ASF can be written as 


ASF(x) = E.,(1[a1 — xB < e; < a2 — xB) 


= P(a; — xB < e; < a2 — xB) 
= F (a2 — xB) — Fe(a1 — xB) 


where F’.(+) is the cdf of e;. We do not know Fe because it depends on the distribution of x1: 
we have specified D(e;|x;) = D(e;|xi1), not D(e;). 


e. From iterated expectations we can write 


ASF(x) = Ex, {E(.[a1 — xB < e; < a2 — xB]|x,,)} 
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and the conditional expectation is a conditional probability: 


E(1[a1 —xB < e; < a2 — xB]|x,,) = P(ai — xB < ei < a2 — xPj|x,,) 


= Plexp(—xi15)(a1 — xB) < a; < exp(—xi15)(@2 — xB)|x;,] 
= [exp(-x18) (a2 — xB)| — ®Lexp(—x15)(a1 — xB)]. 


Therefore, 
ASF(x) = Ex, {(®[exp(—xi16)(a@2 — xB)] — ®[exp(—xi18)(a1 — xB) ]}. 
By the law of large numbers, a consistent estimator is 
N 
N= $1{@[exp(—xi18)(a2 — xB)] - ®[exp(—xi18) (a1 - xp)]} 
i=1 
and, by Lemma 12.1, consistency is preserved if we insert the (consistent) MLES: 
N 
—"~ Ei A K A A A A 
ASF(x) = N+ $ ){@[exp(—xii8)(@2 — xĝ)] - D[exp(-x18)(â1 — xB)]}. 
i=1 
The APEs are estimated by taking derivatives or changes with respect to elements of x in 
[nnn 
ASF(x). 
16.4. a. The results of the ordered probit estimation using investas the response variable are 
given below. Every statistic is identical to when pctstck is used as the response variable. This is 


as it should be, as only the order of the outcomes matter — not the magnitudes. 


. gen invest = © if pcetstck == 
(148 missing values generated) 


. replace invest = 1 if pctstck == 50 
(85 real changes made) 


. replace invest = 2 if pctstck == 100 
(63 real changes made) 


. Oprobit invest choice age educ female black married finc25-finc101 wealth89 


Ordered probit regression Number of obS = 194 
LR chi2(14) = 20.77 
Prob > chi2 = 0.1077 
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Log likelihood = -201.9865 Pseudo R2 = 0.0489 

invest | Coef Std. Err Z P>|z | [95% Conf. Interval 

Se Sa a a” a a ia at ie +--------------------------------------------------------------- 
choice | .371171 .1841121 2.02 0.044 .010318 . 7320241 
age | -.0500516 .0226063 -2.21 0.027 - .0943591 - .005744 
educ | .0261382 .0352561 0.74 0.458 - .0429626 . 0952389 
female | . 0455642 . 206004 0.22 0.825 - . 3581963 . 4493246 
black | . 0933923 . 2820403 0.33 0.741 - .4593965 .6461811 
married | .0935981 . 2332114 0.40 0.688 - . 3634878 . 550684 
finc25 | -.5784299 .423162 -1.37 0.172 -1.407812 . 2509524 
finc35 | -.1346721 . 4305242 -0.31 0.754 - .9784841 . 7091399 
finc50 | -.2620401 .4265936 -0.61 0.539 -1.098148 .5740681 
finc75 | -.5662312 .4780035 -1.18 0.236 -1.503101 . 3706385 
finc100 | -.2278963 . 4685942 -0.49 0.627 -1.146324 .6905316 
finc101 | -.8641109 .5291111 -1.63 0.102 -1.90115 .1729279 
wealth89 | -.0000956 . 0003737 -0.26 0.798 - .0008279 . 0006368 
prftshr | .4817182 .2161233 2.23 0.026 .0581243 . 905312 

Seat eet ee a pn a ft a T BA +--------------------------------------------------------------- 
/cut1 | -3.087373 1.623765 -6.269894 .0951479 
/cut2 | -2.053553 1.618611 -5.225972 1.118865 


b. One quantity that would change is the estimated expected value, something pretty 


obvious because of the rescaling. In particular, 
E(invest|x) = P(invest = 1|x)+2- P(invest = 2|x) 
whereas 


E(petstck|x) = 50 - P(petstck = 50|x) + 100 - P(petstck = 100)x) 
= 50 - P(invest = 1|x) + 100 - P(invest = 2|x) 
= 50 - E(invest|x). 


Because pctstck = 50 + invest, E(pctstck|x) = 50 + E(invest|x). 


16.5. a. We have 
D(y2|z) = Normal(z62, exp(zé, )) 


which means we should use maximum likelihood to estimate 62 and &,. In fact, 59 is 
asymptotically equivalent to a weighted least squares estimator using weights exp(—z;6,). 


b. By assumption, (u1, e2) is independent of z and so D(w,|e2,z) = D(wile2). Because 
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(u1,e2) is bivariate normal with zero mean, we can always write 
uy, = Qie 2 +€1 
where 
eiļe2 ~ Normal(0,77) 


where tł = of — 07, where of = Var(u1). This is necessarily the distribution conditional on z, 
too. 


We can write 
y2 = 262 + exp(z§,/2)e2, 


which shows that y2 is a function of (z,e2). Therefore, e; is independent of y2, too. 


c. We can use the latent variable formulation in equation (16.30) and insert uw; = 01e2 + e1: 


yi = 2101 + V1y2 +01;e2+e1 


e1|Z, v2 ~ Normal(0,77) 
To obtain an error with a unit variance, we divide by 71: 
t/T1) = Z1(81/71) + (Y1/T1)y2 + (01/T1)e2 + (e1/T1) 
and then the cut parameters also get divided by 7. For example, a; < yj < aj if and only if 
(aj/t1) < W/T) < (@j1/T1) 
Therefore, if we run ordered probit of 


yi ON Z1, y2, €2 


we consistently estimate all parameters multiplied by 1/t1. Of course we do not observe ez, but 
we can replace it with estimates because e2 = exp(—z6,/2)v2. 


The two-step approach is to estimate 62 and €, by the MLE from part a. Then create 


352 


A 


Vi2 = Viz - Zið2 


ên = exp(-zi&,/2)%i2 


In the second step, estimate the scaled coefficients by OP of 


A 


Yi ON Zi, Vi2, €i2. 


Let ĉy, j = 1,2,...,J, ĝa, z1, and 6,1 be the scaled coefficients. 
Incidentally, a simple test of the null that y2 is exogenous is the usual MLE ¢ statistic for 


Or. 


d. We can obtain the ASF by averaging out e2 in response probabilities of the form 


Par j Z1871 Y 1y 2 Onez) O(a, Z1871 Y 1y 2 Onez) 


(for 0 < j < J). A consistent estimator of the ASF is 


—™~ 1 N ie A = A M Sy A f A X 
ASF(z1, 2) =N DDL CAR Z1871 Yy 2 Oê) = D(z — 7161 — Vr1y2 — 071€:2)]. 


i=1 


and, as usual, we can compute derivatives or changes with respect to the elements of (zi, 2). 


e. Now the normal MLE is just applied to 
log(v2)|z ~ Normal(z62, exp(z6, )) 
and fp = log(yi2) - Zid2. 
f. Without a distributional assumption for D(w2|e2), allowing for endogeneity is tricky. We 


would still assume that (u1, e2) is independent of z. We could just assume we can write 


u2 = 01e2 + e1 where 
D(e:|e2,z) ~ Normal(0,t7). 


Then the two-step method from part c, with ASF estimated as in part d, applies but where 5. 
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and Ê, are obtained from a suitable estimation procedure. It could be a several step procedure 


or, more conveniently, a single step based on the normal quasi-MLE. That is, we act as if 
D2|z) = Normal(exp(z62), exp(zé, )) 


even though it cannot be literally true. As the results of Gourieroux, Monfort, and Trognon 


(1984a) show, this estimator is generall consistent and yN -asymptotically normal. Then 


V2 = yn- exp(zid2) 
ên = exp(—zi€,/2)?i2 
and the steps in part c can be followed. 
A way to make the method more flexible is to add polynomials in ĉ;2 to the second-stage 
OP. For example, if we just add a square, the ASF would be estimated as 
N 
ASF(z1,y2) SN DD lân j 7100 = Vriy2—- 6né2 - Anes) 
i=1 
— O61; — 21821 — Frys — Onéa — firé3)]. 
where #771 is the estimate on the quadratic term. 


16.6. This problem is similar to that treated in Papke and Wooldridge (2008) for a binary or 


fractional response variable. Using the expression for c; we can write 


Vin = Zin81 + YiViet+WitZ§, + a+ uin 


Zin81 + V1Vi2 + Wit ZG, + Via 


where vin = an + uin. Now we need to make some joint distributional assumptions concerning 


Vin and vin, where 
Yin = 282 + Y2 + ZG, + Vin 


Given the marginal normal distributions assumed in the problem, it is a small step to assuming 
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Vin = Oivi + ein 
where 
D(ein|vir, zi) = Normal(0, tî). 
We could allow 01, and even 77, to depend on ż. 


Now, we can write a control function equation (in latent variable form) as 
Via = Zin81 + YiVia + Wit ZG, + O1Vvia + ein; 
given the conditional normality assumption for e;n, and so using pooled probit of 
Jin ON Zin, Vir, 1, Zi, Vin, t= i aoe ay T3i = 1,. A .,N 


consistently estimates all parameters — including the cut parameters — multiplied by 1/7;. The 


two-step method is then (1) Estimate 62, y2, and €, by pooled OLS of 
Yin ON Zz, 1, Zi, t= 1,...,T;i = 1,...,N. 


This is equivalent to fixed effects estimation of 62. Obtain the residuals, ¥i2. (2) Do pooled OP 


of 
Yin ON Zin, Vir, 1, Zi, Pin, t= i Keene Aa = lises N 


to obtain ðs1, f g1, and so on. A simple extension is to interact Ŷ;2 with time dummies to allow 
the regression of win on vin to change over time. 


b. Define a dummy variable w; = 1[ya = j]. Then 


wy = Ifa; < yñ < ap] 
= 1 


la; < Zinb1 + Y1vie + Cü +Uin < ajn]. 
The ASF for w; is obtained by computing the expected value of the right hand side with 


respect to the unobservable c7 + uin at specific values (Z4, y2): 
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ASF(Zn, V2) = Fria 41l; < Znð1 + Yiye +rin S j+1 |} 


where rin = ca + uin. Note that rin = yi + Zi§, + vin and so we can compute the ASF by 


taking the average over (Z;, vin): 


ASF(Za, y2) = Ewa lay < 2161+ 71V2+YW1t+Z2i6, + via < api} 


= E G,vinein) ALG; < Zn61 4 Ye +Y Ṣd Zi% t Oivin +ein < Aji |} 


Now we can apply iterated expectations. First find E(- 


Zi, Vin) and then average out (Z;, vin). 


Now 


Ela; < 2161+ y1y2 + Wit Zib, + O1vie t+ ein < Aj41]IZi, vie) 


= Og 41 — Zn8e1 —VgiV2—- Wet Ziy 0 Viz) 


where “g” denotes divided by t1. We use the fact that D(ejq|vi2,z;) = Normal(0, 77). It follows 


now by iterated expectations that 


ASF(Za, yo) = EG, vin) [P(@gj41 — Zade — Yay — Wei Ziy 0 Viz) 


— Bg — Zaðgı — Ygıy 2 -Yg Zib g1 OeiVia)]. 


c. To estimate the ASF, we plug in estimates and use a sample average: 


n 


—"~ 1 K A x A _4 A r 
ASE(Za, y2) = N- > [Ge — 21901 — ayn — Ý g1 Zibo 0 Viz) 


i=1 


A A 


As usual, the estimated APEs are derivatives or changes with respect to (Za, y2). To get valid 
standard errors, we can use Problem 12.17 or the panel bootstrap — where both estimation steps 
are carried out with each resampling. 

d. The two-step control function procedure does not require any assumptions about the 


relationship between win and v;,2 for t + r. In other words, while adding v;2 as a control 
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function renders y; contemporaneously exogenous in the estimating equation — e;n is 
independent of y; (and z;) — {vi} is not generally strictly exogenous. An important 
implication is that we should not apply a method such as generalized estimating equations in 
the second stage. 

A method that would render {yj2} strictly exogenous would be to project vin on the entire 
history {vi2,7r = 1,...,7}. There are assumptions under which the projection depends only on 
viņ and the time average, V2. So, we could write 

Vin = OVi + NVa + ein 
and assume e;n is independent of vp = (vi12,...,Vi72)'. Then e;n would be uncorrelated 
independent of (z;, y p) (under the other maintained assumptions). So, at each time period, 
(i2, V2) can be added to the ordered probit — that is, we apply the Mundlak device to the 
reduced form residuals. In addition to pooled OP, one could use a GEE-like procedure in the 
second stage. 

More flexibility would be gotten by using the more general Chamberlain formulation: 

Via = V28n + ein 


where 0,1 is T x 1 for each ¢. Then in each time period include ¥;, as a set of regressors 


interacted with time-period dummies. 
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Solutions to Chapter 17 Problems 

17.1. a. No. Because log(1) = 0 and log(-) is strictly increasing, 

P[log(1 +y) = 0] = PQ = 0) > 0. Of course, log(1 + y) increases much more slowly than y, 
and so one could use log(1 + y) to reduce the influence of “unusually” large observations y; in 
linear regression. Also, remembering that the type I Tobit can be obtained from a latent 
variable model, the transformation log(1 + y) might make the normality and homoskedasticity 
assumptions in the latent variable formulation more plausible. 

b. We can just use ordinary least squares. OLS will be consistent for B (and even 
conditionally unbiased). Our inference should be made robust to heteroskedasticity because the 
restriction r > —xB needs to hold, meaning r cannot be independent of x (unless we restrict the 
range of r or x somewhat arbitrarily). 


c. Exponentiate and subtract one to get 
y = exp(xB +r) — 1 = y = exp(xB) exp(r) - 1 
Now take the expectation conditional on x: 
E(Q)|x) = exp(xB)E[exp(y)|x] — 1. 


If we assume r is independent of x then E[exp(7)|x] =E[exp(7)] = 7, and so 


E(x) = nexp(xB) - 1. 


d. Because ņn =E[exp(v)], an unbiased and consistent estimator of 7 would be 
N 
N1 > exp(r;), 
i=1 


if we observed the random sample of errors, <r; : i = 1,2,...,N}. Instead, we follow Duan’s 


(1983) “smearing” approach (which is really just a method of moments approach) and replace 
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the errors with the OLS residuals, 7;, from the regression log(1 + y,;) on x;. Then a consistent 


estimator of 77 is 
N 
fy = ND exp(), 
i=l 


which is guaranteed to be greater than one by Jensen’s inequality. Note 7) is also greater than 


unity by Jensen’s: 


n = Elexp(*)] > exp[E(*)] = exp(0) = 1. 
e. The estimated conditional mean function is simply 
RO) = fexp(xB) - 1. 

It is not guaranteed to be nonnegative because the estimates B and 7 have not been chosen to 
ensure nonnegativity. It is possible that, for some vectors x, 7} exp(xP) < 1. 

f. The Stata output follows. The estimated 7 is # = 17.18, which is much higher than unity. 
None of the fitted values are negative; they range from about . 061 to 45, 202. The largest 
prediction is almost 10 times above the largest observed hours in the data set, and the average 


of the fitted values, 3, 166, is much too high: the average of actual hours is 740.6. Therefore, 


for predicting hours, using log(1 + hours) in a linear regression is not very appealing. 
. gen Lhoursp1 = log(1 + hours) 


reg lLhoursp1 nwifeinc educ exper expersq age kidslt6 kidsge6, robust 


Linear regression Number of obs = 753 
F( 7, 745) = 73.12 

Prob > F = 0.0000 

R-squared = 0.2950 

Root MSE = 2.9367 

| Robust 

lhoursp1 | Coef. Std. Err. t P>|t | [95% Conf. Interval 

ss i a iS +--------------------------------------------------------------- 
nwifeinc | -.0228321 .0098273 -2.32 0.020 - .0421247 - .0035395 
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educ | .2271644 .0507032 
exper |  .2968677  .0407256 
expersq | -.0043383 0013579 
age | -.122754 0163732 
kidslt6 | -1.991432 2110337 
kidsge6 |  .0372724 0917873 
_cons | 4.833966 1.050092 


. 3267027 
. 3768182 
- .0016726 
- .0906109 
-1.577141 
.2174649 
6.895458 


predict xbhat 
(option xb assumed; fitted values) 


predict rhat, resid 
gen exprhat = exp(rhat) 
sum exprhat 
Variable | Obs Mean 
exprhat | 753 17.17622 


gen hourshat = 17.17622*exp(xbhat ) 


sum hours hourshat 


Variable | Obs Mean 
hours | 753 740.5764 
hourshat | 753 3166.422 


4.48 0.000 ,1276262 
7.29 0.000 .2169171 
-3.19 0.001 - .007004 
-7.50 0.000 -.1548971 
-9.44 0.000 -2.405724 
0.41 0.685 -.1429201 
4.60 0.000 2.112473 
Std. Dev Min 
69.2013 .0012194 1045 
Std. Dev Min 
871.3142 0 
5164.107 . 061139 4520 
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4950 
2.41 


g. The R-squared is computed in the Stata output that follows. It is about .159, which is 


substantially below not only the Tobit R-squared, .275, but also the linear regression 


R-squared, .266. For this data set, using log(1 + y) in a linear regression does not work well. 


corr hours hourshat 


(obs=753) 
| hours hourshat 
es es te hte ka ae “ei a +------------------ 
hours | 1.0000 
hourshat | 0.3984 1.0000 
. di .3984^2 
.15872256 


h. Under the null of independence between r; and x;, we should find no significant 


relationship between r? and any function of x. Yet the F (that is, modified Wald) statistic for 


heteroskedasticity has a p-value of zero to more than four decimal places. Clearly, r; is not 
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independent of x;. 


gen rhatsq = rhat^2 


gen xbhatsq = xbhat42 


reg rhatsq 


Source 


Model 


xbhat xbhatsq 


2 2652.05541 
750 70.749704 


5304.11081 
53062.278 


Number of obs 
F( 2, 750) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


xbhat 


_cons 


3.840246 
- 5971263 
4.286994 


. 4952455 
.0665162 
. 9089565 


7.75 
-8.59 
4.72 


0.000 
0.000 
0.000 


2.868013 
- . 7018431 
2.502592 


4.812478 
- .4406829 
6.071395 


17.2. a. No. The two-limit Tobit only makes sense if there is a corner at both endpoints. 


With P(y = 0) = 0 the two-limit model becomes a one-limit model at unity, which means the 


model does not imply a zero density for y < 0. The estimates will be identical to the Tobit 


model with an upper corner at unity. 


b. Over the range (0, 1], w = — 


log(y) takes values in [0,0), with 


P(w = 0) = P(y = 1) > 0. Assuming that y is continuous on (0, 1), w is continuous over 


(0,00). So w is nonnegative, has a pile up at zero, and is continuously distributed over strictly 


positive values. A type I Tobit model makes logical sense. 


c. It takes some work, but it is tractable. We can write y = exp(—w) but we cannot just pass 


the expected value through the exponential function. One way to proceed is to write 


w = max(0, xB + u) where u|x ~ Normal(0, o?) so y = exp[—max(0, xB + u)]. Then, using 


exp(0) = 1 and splitting the integral over u < —xB and u > —xf, 
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E(x) = D exp[—max(0, xB + u)|(1/o)ġ(u/o)du 


2 [Uo guo)du +f ; , POB + w)IC/a)6 (w/oa 


= O(-xB/o) + exp(—xB)[1 - ®(-xB/o)] I exp(—u) (1/0) @(u/o)du 


= O(-xP/o) + exp(—xB)[1 - ®(—xB/o)]{exp(—1) ®[ (xB + 1)/o]} 
= O(-xB/o) + exp(—xB — 1) ®(xB/o) ®[ (xB + 1)/o] 


Although it is not obvious, this conditional mean function is bounded between zero and one. 
17.3. a. Because y = a if and only if y* < a, we have 
PO = alx) = PQ* < ai |x) = P(xß + u < aj|x) 


= Pl(u/o) < (ai — xB)/o|x] 
= O[(ai — xB)/o]. 


Similarly, 


PO = a2|x) = PQ* = a2|x) = P(xß + u = alx) 
= P[(w/o) = (a2 — xB)/o] = 1- ®[(a2 — xB)/o] 
= D[-(a2 — xB)/o]. 
Next, for ai < y < a2, P(Y < vix) = PQ* < olx) = O[(y — x,B)/o]. Taking the derivative of 


this cdf with respect to y gives the pdf of y conditional on x for values y strictly between a1 


and az: (1/0 )d[(y — xB)/o]. 
b. Because y = y* when a1 < y* < a2, E(y*|x,a1 < yi < a2) = EQ*|x,a1 < y* < az). But 


y* =xB+uanda; < y* < az if and only if a; —xB < u < az — xf. Therefore, using the hint, 


E(Qy*|x,a1 < y* < a2) = xB + E(u|x,a1 — xB < u < ar — xB) 
= xB + oE[(u/o)|x, (a1 — xB)/o < u/o < (a2 — xB)/o] 
_ oX9[ (a — xB)/o] - olla - xB)/o}} 
{®[(a2 — xB)/o] — ®[ (a1 — xB)/o]} 
= EQ\x,a1 < y < az). 
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Now, we can easily get E(y|x) by using the following: 


EQ|x) = aıP(y = ai|x) + EQ), a1 < y < a2) + P(ai < y < a2|x) + a2P(y2 = a2|x) 
= a, ®[(a; — xB)/o] 

+ E(y|x,aı < y < a2) + {®[(a2 — xB)/o] — ®[ (a1 — xB)/o]} 

+ a2®[(xB — a2)/o] 

= a1, ®[(a1 — xB)/o] + (xB) + {®[(a2 - xB)/o] - ®[(a1 — xB)/o] + 

+ o{9[(a1 — xB)/o] — $[(a2 — xB)/o]} 

+ a2®[(xB — a2)/o]. 


c. From part b it is clear that E(v*|x,a1 < y* < az) + xB, and so it would be a fluke if OLS 
on the restricted sample consistently estimated B. The linear regression of y; on x; using only 
those y; such that a; < y; < az consistently estimates the linear projection of y* on x in the 
subpopulation for which a; < y* < az. Generally, there is no reason to think that this will have 
any simple relationship to the parameter vector B. [In some restrictive cases, the regression on 
the restricted subsample could consistently estimate B up to a common scale coefficient. | 


d. We get log-likelihood immediately from part a: 


(0) = 1pi = ai] log{®[(a1 — xiB)/o] + 
+ 1[y; = a2] log{®[(xiB — a2)/o}} 
+ 1[a1 < yi < az]log{(1/o)¢[(vi — xiB)/o]}. 


Note how the indicator function selects out the appropriate density for each of the three 
possible cases: at the left endpoint, at the right endpoint, or strictly between the endpoints. 

e. After obtaining the maximum likelihood estimates B and G?, just plug these into the 
formulas in part b. The expressions can be evaluated at interesting values of x. 

f. We can show this by brute-force differentiation of the expression in part b for E(y|x). As 


a shorthand, write 
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$1 = O[(a1 —xP)/o], b2 = O[(a2 — xB)/o] = $£ - a2)/0], 
®, = O[(a1 —xf)/o], and ®2 = ¢[ (a2 — xP)/o] 


Then 


EQ) _ 


Ox; 


—(a1/o )b1 Bj + (a2/0) 628; 


+ (D2 — ©1)B; + [(xB/o )(b1 — 2) |B; 
+ {[(a1 — xB)/o]b1}B; — {[(a2 — xB)/o 2} B; 


where the first two parts are the derivatives of the first and third terms, respectively, in E(y|x), 
and the last two lines are obtained from differentiating the second term in E(y|x). Careful 
inspection shows that all terms cancel except (®2 — ®;);, which is the expression we wanted 
to be left with. 


The scale factor, 


ee a 


is simply the probability that a standard normal random variable falls in the interval 
[(a1 — xB)/o, (a2 — xB)/o], which is necessarily between zero and one. 


g. The partial effects on E(y|x) are given in f. These are estimated as 


o( 28) of 58 | 
Oo Oo 


where the estimates are the MLEs. We could evaluate these partial effects at, say, X to 
estimate the PEA (partial effect at the average). Or, we can estimate the scale factor for the 


APE of continuous x; as 
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Particularly for the APE, the scaled Tobit coefficients can be compared with the OLS 


coefficients (the 7;). Generally, we expect 


where 0 < p < 1. Of course, this approximation need not be very good in a particular 
application often it is. It does not make sense to directly compare the magnitude of B }; with that 
of y;. By the way, note that ô appears in the partial effects along with the B j 

17.4. The Stata outpus is below. The heteroskedasticity-robust standard error on grant is 
quite a bit bigger, but the robust ¢ statistic is above four. (Interestingly, the 
heteroskedasticity-robust standard error for union is substantially smaller than the usual 
standard error.) The coefficient on grant implies that a firm receiving a job training grant in 
1988 is estimated to provide about 27.2 more hours of job training per worker, holding firm 
size and union status fixed. This effect is very large considering the average hours of annual 


training over all 127 firms is about 16. 
use jtraini 


des hrsemp grant 


storage display value 
variable name type format label variable label 
hrsemp float %9.0g tothrs/totrain 
grant byte %9.0g = 1 if received grant 


reg hrsemp grant lemploy union if d88 


Source | SS df MS Number of obs = 127 
ee nh nen F( 3, 123) = 14.58 
Model | 23232.2579 3 7744.08598 Prob > F = 0.0000 
Residual | 65346.8909 123 531.275536 R-squared = 0.2623 
-------------+------------------------------ Adj R-squared = 0.2443 
Total | 88579.1488 126 703.009118 Root MSE = 23.049 
hrsemp | Coef Std. Err. t P>|t | [95% Conf. Interval 

sm a a ti el es tl ae +--------------------------------------------------------------- 
grant | 27.17647 4.769283 5.70 0.000 17.73597 36.61698 
lemploy | -5.511867 2.012923 -2.74 0.007 -9.496324 -1.527409 
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-19.598 
16.22 


Number of obs = 
123) = 


F( 
Prob > F 
R-squared 
Root MSE 


3, 


14.258 
-9.8208 
-15.22 


27 
92 


81 
07 
21 


1.748465 
45.31037 


| 
= 
N 
N 


40.09414 
-1.202926 
-2.627702 


union | -8.924901 5.392118 -1.66 0.100 
_cons | 30.76978 7.345811 4.19 0.000 
reg hrsemp grant lemploy union if d88, robust 
Linear regression 
Robust 
hrsemp | Coef Std. Err. t P>|t | 
vale a Ce “i +--------------------------------------------------------------- 
grant | 27.17647 6.525922 4.16 0.000 
lemploy | -5.511867 2.17685 -2.53 0.013 
union | -8.924901 3.181306 -2.81 0.006 
cons | 30.76978 8.558935 3.60 0.000 


13.82 


79 


47.171167 


b. The Tobit results are below. Out of 127 firms in 1988, 38 provide no job training. 


count if hrsemp == 0 & d88 
38 


tobit hrsemp grant lemploy union if d88, 11(0) 


Tobit regression 


Number of obs 


48.46016 

. 330044 
1.786677 
39.66594 


LR chi2(3) = 
Prob > chi2 = 
Log likelihood = -451.88026 Pseudo R2 = 
hrsemp | Coef Std. Err t P>|t | 

sm a a i a le a a +--------------------------------------------------------------- 
grant | 36. 34335 6.121823 5.94 0.000 24.22655 
lemploy | -4.928542 2.656817 -1.86 0.066 -10.18713 
union | -12.63617 7.286913 -1.73 0.085 -27.05901 
cons | 20.32933 9.769517 2.08 0.040 .9927198 

i ae, at a i hee ee +--------------------------------------------------------------- 
/sigma | 28.70726 2.229537 24.29438 
Obs. summary: 38 left-censored observations at hrsemp<=0 

89 uncensored observations 


© right-censored observations 


The language “left censored at hrsemp <= 0” is misleading for corner solution applications, 


but it does tell us that 38 of the 127 firms have hrsemp = 0. The estimate of o is ô = 28.71. 


To get the effect of grant on E(Arsemp|grant, employ, union, hrsemp > 0), we must 


compute the inverse Mills ratio with grant = 1 and grant = 0. We set 
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employ = employ = 60.87 and union = 1. Below is the Stata session. 


. gen xb1 = _b[_cons] + _b[grant] +_b[lemploy]*1log(60.87) + _b[union] 


. gen xbO = _b[_cons] +_b[lemploy]*log(60.87) + _b[union] 
. gen probí = normal(xb1/_b[/sigma] ) 

. gen probO = normal(xb0/_b[/sigma] ) 

. gen imr1 = normalden(xb1/_b[/sigma] )/prob1 

. gen imrO = normalden(xb0/_b[/sigma] )/probo 


. gen cm1 = xb1 + _b[/sigma]*imr1 


. gen cmO = xb0 + _b[/sigma]*imro 
. gen dcm = cm1 - cmỌ 


. list dcm in 1 


. gen umí = prob1*cm1 
. gen umO = probO*cm0 
. gen dum = umi - umd 


. list dum in 1 


1. | 20.81422 | 


For firms already doing some job training, the grant is estimated to increase training by 
about 15.1 hours per employee. When we add in the effects of firms that go from no training to 
positive hours, the expected change is about 20.8 hours at union = 1 and the average value of 
employ in the sample. This is somewhat less than the OLS estimate we obtained earlier, 27.2. 

The estimated APE is on the unconditional mean is computed below as 26.2, which is 


pretty close to the OLS estimate of 27.2. Bootstrapping can be used to obtain a valid standard 


367 


error. 


predict xb, xb 
(31 missing values generated) 


sum xb if d88 


Variable | Obs Mean Std. Dev. Min 
~~ 3 i 146 9.095312 16.80602 -18.41981 
replace xb = . if ~d88 
(294 real changes made, 294 to missing) 
. replace xb = . if hrsemp == . | lemploy == . | union = . 
(19 real changes made, 19 to missing) 
sum xb 
Variable | Obs Mean Std. Dev. Min 
e: a TA 127 9.265182 17.20874 -18.41981 


gen xbO = xb - _b[grant]*grant 
(344 missing values generated) 


gen xb1 = xbO + _b[grant] 
(344 missing values generated) 


gen probO = normal(xb0/_b[/sigma] ) 
(344 missing values generated) 


gen probí = normal(xbi/_b[/sigma] ) 
(344 missing values generated) 


gen imrO = normalden(xb0/_b[/sigma] )/probo 
(344 missing values generated) 


gen imri = normalden(xb1/_b[/sigma] )/prob1 
(344 missing values generated) 


gen cm0 = xbO + _b[/sigma]*imro 
(344 missing values generated) 


gen cm1 = xb1 + _b[/sigma]*imr1 
(344 missing values generated) 


gen umO = probO*cmO 
344 missing values generated 
g g 


gen um1 = probi*cm1 
(344 missing values generated) 


. gen pe = um1 - umd 
(344 missing values generated) 


sum pe 
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48.7405 


48.7405 


Variable | Obs Mean Std. Dev. Min Max 


pe | 127 26.23 3.272887 16.88082 30.56553 


c. They are jointly significant at the 1.5% level, as the Stata “test” command shows. 


test lemploy union 


( 1) [model]lemploy = 0 
( 2) [model]union = © 
F( 2, 124) = 4.34 
Prob > F = 0.0151 


d. For the Tobit model, I use the square of the correlation between y; = Arsemp; and 
E(y|x;) as an R-squared that can be compared with the linear model R-squared. After the 
tobit command in Stata, E(y;|x;) can be gotten using the ystar option for predicted values. 
(Unfortunately, Stata’s naming convention conflicts with the notation used in the text, as y* is 


used to denote the unerlying latent variable, not the actual outcome.) 


predict hrsemph if d88 & hrsemp != ., ystar(0,.) 
(344 missing values generated) 


. corr hrsemp hrsemph 


(obs=127) 
| hrsemp hrsemph 
a a a Sa ee ee +------------------ 
hrsemp | 1.0000 
hrsemph | 0.5206 1.0000 


. di (.5206)A2 
. 27102436 


This R-squared is slightly above that for the linear model (.262), and so the Tobit does 
provide a better fit. And remember, the Tobit estimates are not chosen to maximize an 


R-squared, so the improvement in fit is effectively better. 


17.5. a. The results from OLS estimation of the linear model are given below. 


use fringe 


reg hrbens exper age educ tenure married male white nrtheast nrthcen south 
union, robust 


Linear regression Number of obs = 616 
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F( 11, 
Prob > F 
R-squared 
Root MSE 


604) = 


tenure 
married 
male 
white 
nrtheast 
nrthcen 
south 


Robust 


Std. Err. 


. 0029862 
- .0022495 
082204 
0281931 
. 0899016 
. 251898 
.098923 

- . 0834306 
- .0492621 
- .0284978 
. 3768401 
- .6999244 


.0042485 
.0041519 
. 0085122 
.0037053 
.0499158 
.0496953 
.0721337 
.0723545 
.0626967 
. 0653108 
. 0535136 
. 1803555 


- .0053574 
- .0104034 
. 065487 
0209164 
- .0081281 
. 1543015 
- .0427402 
- .2255277 
- .1723922 
- .1567617 
.2717448 
-1.054125 


.0113298 
. 0059043 
.0989211 
. 0354699 
.1879312 
. 3494946 
. 2405862 
.0586664 
.073868 
. 0997662 
. 4819354 
- . 3457242 


b. The Tobit estimates recognizing the corner at zero are 


tobit hrbens exper age educ tenure married male 
union, 11(0) 


Tobit regression 


Log likelihood = -519.66616 


tenure 
married 
male 
white 
nrtheast 
nrthcen 


Obs. 


summary: 


white nrtheast nrthcen south 


Number of obs 
LR chi2(11) 


616 
283.86 


.0132201 
.0061263 
. 1042321 
. 0360227 
. 2084814 
. 364019 
. 2538106 
.0743625 


Prob > chi2 = 

Pseudo R2 
Coef Std. Err t P>|t | [95% Conf 
. 0040631 . 0046627 0.87 0.384 - .0050939 
- .0025859 . 0044362 -0.58 0.560 - .0112981 
. 0869168 . 0088168 9.86 0.000 . 0696015 
.0287099 . 0037237 7.71 0.000 .021397 
.1027574 . 0538339 1.91 0.057 - .0029666 
»2556765 .0551672 4.63 0.000 . 1473341 
.0994408 .078604 1.27 0.206 - .054929 
- .0778461 .0775035 -1.00 0.316 - . 2300547 
- .0489422 0713965 -0.69 0.493 -.1891572 
- .0246854 0709243 -0.35 0.728 - .1639731 
4033519 0522697 7.72 0.000 3006999 
- .8137158 1880725 -4.33 0.000 -1.18307 
5551027 0165773 .5225467 
41 left-censored observations at hrbens<=0 

575 uncensored observations 


© right-censored observations 


The Tobit and OLS estimates are similar because only 41 of 616 observations, or about 
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6.7% of the sample, have irbens = 0. As expected, the Tobit estimates are all slightly larger in 
magnitude; this reflects that the scale factor is always less than unity. 


c. Here is what happens when exper? and tenure? are included: 


tobit hrbens exper age educ tenure married male white nrtheast nrthcen south 
union expersq tenuresq, 11(0) 


Tobit regression Number of obs = 616 
LR chi2(13) = 315.95 

Prob > chi2 = 0.0000 
Log likelihood = -503.62108 Pseudo R2 = 0.2388 

hrbens | Coef Std. Err t P>|t | [95% Conf. Interval 

Se ein Se me pet, Sano Sey a +--------------------------------------------------------------- 
exper | .0306652 .0085253 3.60 0.000 .0139224 .047408 
age | -.0040294 . 0043428 -0.93 0.354 - .0125583 . 0044995 
educ | .0802587 .0086957 9.23 0.000 .0631812 .0973362 
tenure | .0581357 .0104947 5.54 0.000 .037525 .0787463 
married | .0714831 . 0528969 1.35 0.177 - .0324014 . 1753675 
male | . 2562597 .0539178 4.75 0.000 . 1503703 . 3621491 
white | .0906783 .0768576 1.18 0.239 - .0602628 . 2416193 
nrtheast | -.0480194 .0760238 -0.63 0.528 - .197323 . 1012841 
nrthcen | - .033717 . 0698213 -0.48 0.629 - .1708394 . 1034053 
south | - .017479 0693418 -0.25 0.801 - . 1536597 1187017 
union | 3874497 051105 7.58 0.000 . 2870843 4878151 
expersq | -.0005524 0001487 -3.71 0.000 - .0008445 0002604 
tenuresq | -.0013291 0004098 -3.24 0.001 - .002134 0005242 
_cons | -.9436572 1853532 -5.09 0.000 -1.307673 5796409 

Ganar aa a SS i a es +--------------------------------------------------------------- 
/sigma | .5418171 0161572 .5100859 5735484 

Obs. summary: 41 left-censored observations at hrbens<=0 
575 uncensored observations 
© right-censored observations 
test expersq tenuresq 


1) [model]expersq = 0 
[model]tenuresq = 0 


F( 2, 603) 
Prob > F 


16.34 
0.0000 


Both squared terms are very statistically significant as well as jointly significant. What is 
not clear is whether their presence would change the estimated partial effects in important 


ways. 


d. There are nine industries, and we use ind1 as the base industry: 
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tobit hrbens exper age educ tenure married male white nrtheast nrthcen south 


union expersq tenuresq ind2-ind9, 11(0) 


Tobit regression 


Log likelihood = -467.09766 


Number of obs 
LR chi2(21) 
Prob > chi2 
Pseudo R2 


tenure 
married 
male 
white 
nrtheast 
nrthcen 
south 
union 
expersq 
tenuresq 
ind2 
ind3 
ind4 
ind5 


. 0267869 
- .0034182 
.0789402 
053115 
.0547462 
» 2411059 
. 1188029 
- .1016799 
- .0724782 
- .0379854 
. 3143174 
- .0004405 
- .0013026 
- .3731778 
- .0963657 
- . 2351539 
.0209362 
- . 5083107 
. 0033643 
- .6107854 
- .3257878 
- .5750527 


. 0081297 
. 0041306 
. 0088598 
.0099413 
.0501776 
.0556864 
.0735678 
.0721422 
.0667174 
.0655859 
.0506381 
.0001417 
. 0003863 
. 3742017 

. 368639 
. 3716415 

.373072 
. 3682535 
. 3739442 

.376006 
. 3669437 
. 4137824 


.0108205 

- .0115306 
.06154 

. 0335907 

- .0438005 
.1317401 

- .0256812 
- . 2433643 
- . 2035085 
- .1667934 
. 2148662 

- .0007188 
- .0020613 
-1.108095 
- .8203575 
- .9650425 
-. 7117618 
-1.231545 
- . 7310468 
-1.349246 
-1.04645 

-1.387704 


.0427534 
. 0046942 
. 0963403 
.0726393 
. 1532928 
. 3504717 
. 2632871 
.0400045 
.0585521 
.0908226 
. 4137686 
- .0001623 
- .000544 
. 3617389 
.6276261 
. 4947348 
. 7536342 
. 214924 
. 7377154 
.127675 
. 3948746 
. 2375989 


obs. 


summary: 
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testparm ind2-ind9 


( 1) [model]ind2 = 0 
( 2) [model]ind3 = 0 
( 3) [model]ind4 = 0 
( 4) [model]ind5 = 0 
( 5) [model]ind6 = 0 
( 6) [model]ind7 = 0 
( 7) [model]ind8 = 0 
( 8) [model]ind9 = 0 
F( 8, 595) = 9.66 
Prob > F = 0.0000 


41 left-censored observations at hrbens<=0 
5 uncensored observations 
© right-censored observations 


Each industry dummy variable is individually insignificant at even the 10% level, but the 


joint Wald test says that they are jointly very significant. This is somewhat unusual for dummy 
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variables that are necessarily othogonal (so that there is not a multicollinearity problem among 
them). The likelihood ratio statistic is LR = 2(503.621 — 467.098) = 73.046, which is roughly 
comparable with Q + F = 8 - 9.66 = 77.28. The p-values in both cases are essentially zero. 

Several estimates on the industry dummies are economically significant, with a worker in, 
say, industry eight earning about 61 cents less per hour in benefits than a comparable worker in 
industry one. [In this example, with so few observations at zero, it is roughly legitimate to use 
the parameter estimates as the partial effects. | 

17.6. a. First, we can write uw; = Piv2 + e1, where pı = Cov(v2,u1), and we use the fact 
that Var(v2) = 1. Also, of = pj +t} where 77 = Var(e1). The distribution of yı given (z, v2) 
can be written as g(t) ,|z151 + @1y2 + P1v2,07 — pî), where y; is the generic argument. Next, 
we need the density of v2 given (z,y2), which is given in equations (15.55) and (15.56). To 
obtain the density of yı given (z, y2), we can apply Property CD.3 in Appendix 2A. The 
density of v2|(z,v2 = 1) is @(02)/(zb2) for v2 > —zð2. So the density of yı given (z,y2 = 1) 


is 


Oba) J gg, 8012181 + a12 + 102,01 — pi)p(02)d02 


(where b> is just the dummy argument in the integration) and the density given (z,y2 = 0) is 


-Z 


32 
LOGE | 8028: + aya + p102,0} - pi) b(wa)do2. 


b. We need to combine the density obtained from part a — called it 
fC, v2, 25 81, @1, p1, 04,82), and let h(y,|z; 52) be the probit density of y2 given z. Actually, it is 
2_ 
1 


easier to work with t? = of — pj. Then the log-likelihood for observation i is 
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logan, Zi; 81,01, P1,73,82)] + log[A alz; 82)] 


hit, 1 7 ae 2 
= Vi2 log( Did.) i gQvi\zn81 + ayz + pivasti)6(02)d02 ) 


-7;82 
+ (1 -va)log( aT f g(yalZzaðı + A1Vi2 + p102, riDA2doz ) 


+ yi2log[®(zi62)] + (1 — yz) log[1 - ®@82)] 


which simplifies to 
À g(vi|z81 + ayz + piv2,t})6(02)db> ) 
585 


i 


u = yalog( f 


Zið2 
+ (1 - ya) log( f 2gQva|zin81 + Q1Vi2 + piv2,t?)9(02)d2 ). 
If pı = 0, the log likelihood becomes 


0;(0) = yo logle@alzaðı + 12,77) O(z;52) | 
+ (1—yi2) log{g(vi|zi181 + a1yi2,77)[1 — (z62)]} 
= log(gQvilza81 + @1y2,77)] + v2 logl®(z52)] + (1 - yz2)log[1 - ®(z:82)], 


which is two separate log-likelihoods, one the standard Tobit for y; give (zi1,i2) and the 
second for probit of y; given z;. 

c. As in the probit case (Section 15.7.3), this is another example of a forbidden regression. 
There is no way that E(y1|z) has the Tobit form with zı and ®(zé2) = E(y2|z) as the 
explanatory variables. In fact, because yı = max(0,z101 + @1y2 + u1), E(yi|z) has no simple 
form — although it could be computed in principle. 

d. As given in the hint, it is easiest to work with the parameterization in terms of tî, as 


shown in part b. Passing the derivative through the integral gives 
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anO) _ Es vzg (yi|zi181 + ayn + p102,71)$(02)dv2 
Ca g(yalZaðı + ayz + p102,T7)$(02)do2 


-Zð 

(1-ya) f v2g Walen81 + ary2 + piv2,77)$(v2)do2 
F = yi PY ROn a u oF Lap a I N ae aS 
i * g(vinlzin81 + 1y + P102,77)b(02)dv2 


where g% denotes the first derivative. When we set p1 = 0 the first term becomes 


LY (yalzaðı + Ayn, T$) S v26(02)dv2 
ae g(vilzin81 + ayn, T1) joe p(02)dv2 
oe g (valzn81 + a1yi2,T7) : $(zi52) 

g(yalZzaðı + Ayn, T$) [1 — ®(-z:82)] 
g (YalZaðı + O1Yi2,T7) 


= So hn pr (2580 
g(vi|zi61 + Ayn, Ti) ve 


where A(a) = $(a)/®(a) is the inverse Mills ratio and we use the fact that I vd(v)dv = o(a) 


for any a € R. Similarly, using i v¢(v)dv = —(a), the second term is 


a alan: tawa Ti) Fa _ yaya(-aid0)] 
g(vi|zid1 + Ayi, T1) = fs 


and so the partial derivative evaluated at pı = 0 is 


LY (yalzaðı + AYR TI) 
g(vilzZi81 T Ayi, T1) 
g (alzas F AYR TI) 


= 2 or (95282) 
g(viilzZi81 + @1Vi2,T7) ary 


[vi2d(zi62) — (1 — yi2)A(-2i82)] 


where gr(y2,Z:ið2) = Vi2A(zib2) — (1 — yi2)A(—z;82) is the generalized residual. The key is that 
this is the same partial derivative we would obtain by simply adding grj2 = gr(vi2,zi62) as an 
explanatory variable and giving it a coefficient, say 1;. In other words, form the artificial 


model 
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Vi = max(0, 2161 + Ain + M127 i2 + ei) 


ealZa,yY,gr ~ Normal(0, tî). 

Of course, under the give assumptions this “model” cannot be true when 7; + 0. But if we act 
as if it is true and compute the score for testing Ho : 71 = 0, we get exactly the score derived 
above. So we are led to a simple variable addition test. In the first stage estimate probit of y;2 
on Z; to get 5». Construct the generalized residuals, 

Ea = viad(eibe) — (1 —yin)A(-zib2). 
In the second step, estimate a Tobit model of yj on Zi, Yiz, a. and use a ¢ test for the 
coefficient f1 on g7,,. Under the null, the statistic has an asymptotic Normal(0, 1) distribution, 
with no need to adjust the standard error for estimation of 6p. 

Incidentally, while adding the generalized residual — which acts as a kind of control 
function — does not generally solve the endogeneity of y2 under the assumptions of this 
problem, it might be a decent approximation. It is likely to do well when p1 is “close” to zero 
(although we then must wonder how much of a problem endogeneity is in the first case). There 
is some evidence that it can work well as an approximation more general, where focus would 
be on average partial effects. Putting in flexible functions of gr. — such as low-order 
polynomials — can help even more. 

If we simply assert that D(v1|Z1, y2, gr2) follows the Tobit model given above then adding 
a. does produce consistent estimators of all parameters and average partial effects (by 
averaging out a. in the partial effect formulas for the standard Tobit). This idea is 
nontraditional but is in the spirit of viewing all models simply as approximations. 


17.7. Let s = 1[y > 0] and use Property CV.3 about conditional variances (see Appendix 
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2.A.2): 
Var(y|x) = E[Var(y|x, s)|x] + Var[E |x, s)|x] 
Now because y = s + w*, 


Elx, s) = s + E(w*|x,s) = s + E(w*|x) = s + exp(xB) 
Var(y|x,s) = s? + Var(w*|x,s) = s + Var(w*|x) = s + 7?[exp(xB)]? 


and so 


Var(y|x) = E[s + n? exp(2xB)|x] + Var[s - exp(xB)|x] 
= P(s = 1|x)n* exp(2xB) + Var(s|x) exp(2xB) 
= 1°@(xy) exp(2xB) + O(xy)[1 — O(xy)] exp(2xB) 


17.8. a. For model (1) simply use ordinary least squares. Under the conditional mean 
assumption, we could use a weighted least squares procedure if we suspect heteroskedasticity, 
as we might, and have a particular form in mind. However, we should probably not think of the 
linear model as a model of E(y|x); rather, it is simply the linear projection. If we use a WLS 
procedure, we are effectively estimating a linear predictor in weighted variables. 

For model (2) we could use nonlinear regression, or weighted nonlinear regression. The 
latter is attractive because of probable heteroskedasticity in Var(y|x). We might use a variance 
function proportional to exp(xB) or [exp(xB) | ? or a quadratic in the mean function: 

Var(y|x) = ôo + 01 exp(xB) + 6 o[exp(xB) ]° which contains the previous two as a special case. 
We can estimate the 6; from the OLS regression ú? on 1, p;, and #?, where the hatted quantities 
are from a first stage NLS estimation. The fitted values are the estimated conditinal variances 
(and we might have to worry about whether they are all strictly positive). Other attractive 
options are Poisson regression — see Chapter 18 for a description of its robustness properties 


for estimating E(y|x) — or regression using the Exponential quasi-log-likelihood (see Chapter 
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18). 

Naturally, for model (3) we would use MLE. 

b. We can compute an R-squared type measure any time we directly model E()|x) or we 
have an implied model for E(y|x) (such as in the Tobit case). In each case, we obtain the fitted 
values, Êo i\X;),2 = 1,...,N. Once we have fitted values we can obtain the squared correlation 
between y; and E(y;|x;). These can be compared across different models and even estimation 
methods. Alternatively, one can use a sum-of-squared residuals form: 


DrD- EO)? 


Rea —— 
01 -Y) 


In the linear regression case with an intercept, the two ways of computing R-squared are 
identical, but the equivalance does not hold in general. In fact, the SSR version of R-squared 
can be negative in some cases. One can always compute an “adjusted” R-squared, too: 


R= 1 War) bi - Evilxi)]? 
i N = 
(N- Di 2 0% =y)? 


where P is the number of estimated parameters in the mean function. 

c. This is clear from equation (17.20). If y; > 0 fori = 1,..., N, then only the second term 
in the log likelihood appears. But that is just the log likelihood for the classical linear 
regression model where y;|x; ~ Normal(x;B, 07). It is well known that the MLE of B in this 
case is the OLS estimator. 

It may seem a bit odd, but if we truly believe the population follows a Tobit model — and 
just happen to obtain a sample where y; > 0 for all i — then the appropriate estimate of E(y|x) 
is gotten from (17.14), where we plug in the usual OLS estimators for B and o?. Estimates of 


E(y|x) computed in this way would ensure that fitted values in the sample are all positive, even 
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though xip could be negative for some i. 

d. If y > 0 in the population, a Tobit model makes no sense because P(y = 0) > 0 fora 
Tobit model. Instead, we could assume E[log(y)|x] = xy, or, equivalently, log(v) = xy + v, 
E(v|x) = 0. If we make the stronger assumption that v is independent of x, then 
Elx) = 7 + exp(xy), where 7 = E[exp(v)] > 1. After estimating y from the OLS regression 
log(v;) on x;,i = 1,...,N, we can estimate 7 using Duan’s (1983) estimator, as in Problem 
17.1: 

N 
ff = N1 $ exp@)), 
i=1 
where the Ŷ; are the OLS residuals. 

17.9. a. A two-limit Tobit model, of the kind analyzed in Problem 17.3, is appropriate, with 
a, = 0,a2 = 10. 

b. The lower limit at zero is logically necessary considering the kind of response: the 
smallest percentage of one’s income that can be invested in a pension plan is zero. On the 
other hand, the upper limit of 10 is an arbitrary corner imposed by law. One can imagine that 
some people at the corner y = 10 would choose y > 10 if they could. So, we can think of an 
underlying variable, which would be the percentage invested in the absence of any restrictions. 
Then, there would be no upper bound required (since we would not have to worry about 100 
percent of income being invested in a pension plan). 

17.10. A more general version of this problem is done in Problem 17.3, part f: seta; = 0 
and let az — o. 

17.11. No. OLS always consistently estimates the parameters of a linear projection 


provided the second moments of y and the x; are finite and Var(x) has full rank K — regardless 
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of the nature of y or x (discrete, continuous, some mixture). The fact that we can always 


consistently estimate a linear projection by OLS is why linear regression analysis is always a 


reasonable step for discrete outcomes (provided there is no data censoring problem of the type 


we discuss in Chapter 19). As discussed in Chapters 15 and 17, the linear regression 


coefficients often are close to estimated average partial effects from more complicated models. 


See Problem 17.4 part b for an example. 


17.12. a. 248 out of 660, or about 37.6%, have ecolbs; = 0. The positive responses range 


from .333 to a high of 42, but with focal points at integer values, especially one pound and two 


pounds. Therefore, a Tobit model cannot literally be true, but it can still lead to good estimates 


of the conditional mean and partial effects. 


b. The linear model results are given below: 


use apple 
gen lecoprc 
gen lregprc 


gen lfaminc 


reg ecolbs lecoprc lregprc lfaminc educ hhsize num5_17 


Source 


Model 


= log(faminc) 


4048 .98735 


4204 .13682 


= log(ecoprc) 


= log(regprc) 


155.149478 


6 25.8582463 
653 6.20059318 


659 6.3795703 


6, 
Prob > F 
R-squared 
Adj R-squared = 
Root MSE 


Number of obs = 
653) = 


| 

O 
O 
© 


lecoprc 
lregprc 
lfaminc 
educ 
hhsize 
num5_17 
_cons 


-2.56959 
2.204184 
. 203861 


5865181 
5903005 


. 155967 


-1.417901 
3.3633 
.5101184 
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The price coefficients are of the expected sign: there is a negative own price effect, and a 
positive price effect for the substitute good, regular apples. The coefficient on log(ecoprc) 
implies that a 10% increase in ecoprc leads to a fall in estimated demand of about . 26 lbs. At 
the mean value of ecolbs, about 1.47 lbs, this is an estimated own price elasticity of 
—2.57/1.47 = —1.75, which is very large in magnitude. 

c. The test for heteroskedasticity is given below. The F statistic, which is asymptotically 
valid as a test for heteroskedasticity, gives a pretty large p-value, .362, so this test does not 


find much evidence of heteroskedasticity. 


predict ecolbsh 
(option xb assumed; fitted values) 


gen ecolbshsq = ecolbsh42 
predict uh, resid 


gen uhsq = uhA2 


reg uhsq ecolbsh ecolbshsq 
Source | SS df MS Number of obs = 660 
ee rene F( 2, 657) = 1.02 
Model | 8923.31842 2 4461.65921 Prob > F = 0.3620 
Residual | 2880416.28 657 4384.19525 R-squared = 0.0031 
-------------+------------------------------ Adj R-squared = 0.0001 
Total | 2889339 .6 659 4384.43034 Root MSE = 66.213 
uhsq | Coef Std. Err t P>|t | [95% Conf. Interval 
a ee eae ee ae ay +--------------------------------------------------------------- 
ecolbsh | 32.61476 31.09945 1.05 0.295 -28.45153 93.68105 
ecolbshsq | -8.9604 10.32346 -0.87 0.386 -29.23136 11.31056 
_cons | -20.36486 21.92073 -0.93 0.353 -63.40798 22.67827 
d. The fitted values were already gotten from part c. The summary statistics are 
sum ecolbs ecolbsh 
Variable | Obs Mean Std. Dev Min Max 
a ey et. tia eke a “la +-------------------------------------------------------- 
ecolbs | 660 1.47399 2.525781 0 42 
ecolbsh | 660 1.47399 . 485213 .2251952 2.598743 


count if ecolbs < 2.6 
541 
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di 541/660 
.81969697 


The smallest fitted value is .225, and so none are negative. The largest fitted value is only 
about 2.6, but about 82 percent of the observations have ecolbs; below 2.6. Generally, it is 
difficult to find models that will track such a wide range in actual outcomes. Further, one 
might suspect the largest value, 42, is a mistake or an outlier. (The estimates with this one 
observation dropped give a similar story, but the price coefficients shrink in magnitude.) 

e. The Tobit results are given below. The signs are the same as for the linear model, with 
the price and income variables being more statistically significant for Tobit. We know that the 
coefficients need to be scaled down in order to obtain the partial effects. That the Tobit 
coefficient on log(ecoprc) is about double the OLS estimate is not surprising, and we need to 
compute a scale factor. The scale factor for the APEs (of continuous explanatory variables) is 
about .547. If we multiply each Tobit coefficient by .547, we get fairly close to the OLS 


estimates. 


tobit ecolbs lecopre lregpre lfaminc educ hhsize num5_17, 11(0) 


Tobit regression Number of obs = 660 
LR chi2(6) = 50.79 

Prob > chi2 = 0.0000 

Log likelihood = -1265.7088 Pseudo R2 = 0.0197 
ecolbs | Coef Std. Err t P>|t | [95% Conf. Interval 

ae ne ee ee eee ay +--------------------------------------------------------------- 
lecopre | -5.238074 .8 748606 -5.99 0.000 -6.955949 -3.5202 
lregprc | 4.261536 . 8890055 4.79 0.000 2.515887 6.007185 
lfaminc | .4149175 . 2363235 1.76 0.080 - .0491269 .8789619 
educ | . 1005481 . 068439 1.47 0.142 - .0338386 . 2349348 
hhsize | - 0330173 . 1325415 0.25 0.803 - .2272409 . 2932756 
num5_17 | - 2260429 . 1970926 1.15 0.252 - .1609678 - 6130535 
_cons | -1.917668 1.160126 -1.65 0.099 -4.195689 . 3603525 

tes a a ae ee se ie el +--------------------------------------------------------------- 
/sigma | 3.445719 .1268015 3.196732 3.694706 

Obs. summary: 248 left-censored observations at ecolbs<=0 
412 uncensored observations 


© right-censored observations 
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predict xbh, xb 


gen prob = normal(xbh/_b[/sigma] ) 


sum prob 
Variable | Obs Mean Std. Dev. Min Max 
Batar m a gee eee ee eee +-------------------------------------------------------- 
prob | 660 .5472506 .1152633 . 2610003 .8191264 


f. This question is a bit ambiguous. I will evaluate the partial effect at the mean value of 
ecoprc, regprc, and faminc, and then take the log, rather than averaging the logs. The scale 
factor for the APEs is given in part e: .547. The scale factor for the partial effects at the mean 
is . 539, which is fairly close. The PAE of lecoprcis about —2.82, which is somewhat bigger in 
magnitude than the OLS estimate, —2. 57. 


To get the estimated elasticity, we need to estimate E (ecolbs|x ) at the mean values of the 


covariates; we get about 1.55. So the estimated elasticity at the mean values of the covariates 
is about —2.82/1.55 ~ —1.82. This is slightly larger in in magnitude than that computed for the 


linear model, —1. 75. 


sum ecopre regprc faminc educ hhsize num5_17 


Variable | Obs Mean Std. Dev Min Max 

ee ln a en ee pee, Fa ane Sy +-------------------------------------------------------- 
ecoprc | 660 1.081515 . 295573 59 1.59 
regprc | 660 8827273 2444687 59 1.19 
faminc | 660 53.40909 35.74122 5 250 

educ | 660 14.38182 2.274014 8 20 

hhsize | 660 2.940909 1.526049 1 9 

a ee E es E +-------------------------------------------------------- 
num5_17 | 660 .6212121 .994143 0 6 


di normal((_b[_cons] + _b[lecoprc]*log( 1.081515 ) 
+ _b[lregprc]*log(.8827273) + _b[1faminc]*1log(53.40909) 
+ _b[educ]*14.38182 + _b[hhsize]*2.940909)/_b[/sigma] ) 


. 53860761 


. di .53860761*_b[lecoprc] 
-2.8212668 


di _b[_cons] + _b[lecoprc]*log( 1.081515 ) 
+ _b[lregprc]*log(.8827273) + _b[1lfaminc]*1log(53.40909) 
+ _b[educ]* 14.38182 + _b[hhsize]*2.940909 


. 33398136 
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di normalden((_b[_cons] + _b[lecoprc]*log( 1.081515 ) 


.3970727 


+ _b[lregprc]* log(.8827273) + _b[1faminc]*1log(53.40909 ) 
+ _b[educ]* 14.38182 + _b[hhsize]*2.940909)/_b[/sigma] ) 


di .33398136* .53860761 + _b[/sigma]*.3970727 


1.5480857 


. di -2.82/1.55 
-1.8193548 


g. Dropping log(regprc) greatly reduces the magnitude of the coefficient on log(ecoprc): 


from —5. 24 to —1.82. A standard omitted variable analysis is a linear context suggests a 


positive correlation between /ecoprc and /regprc. In fact, they are very highly positively 


correlated, with a correlation of about .82. This high correlation was built in as part of the 


experimental design. 


tobit ecolbs lecopre 1faminc educ hhsize 


Tobit regression 


Log likelihood = -1277.3043 


num5_17, 11(0) 


Number of obs = 


. 5044411 
. 2395441 
.0692025 
. 1340901 
.1996317 
1.161745 


LR chi2(5) = 

Prob > chi2 = 

Pseudo R2 = 
t P>|t | 
61 0.000 -2.813229 
64 0.101 -.0771978 
69 0.092 - .0189769 
17 0.868 - .2410699 
24 0.216 - . 1445424 
47 0.014 -5.154351 

3.245569 


- .8321952 
. 8635362 
.252794 
. 2855266 
. 6394483 
-.5919615 


ecolbs | Coef 
lecopre | -1.822712 
lfaminc | 3931692 
educ | 1169085 
hhsize | 0222283 
num5_17 | 2474529 
_cons | -2.873156 
/sigma | 3.499092 
Obs. summary: 
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. corr lecopre lregprc 
(obs=660) 


248 left-censored observations at ecolbs<=0 


uncensored observations 
© right-censored observations 


| lecoprce lregprc 

fs Se i i a she ai +------------------ 
lecoprc | 1.0000 

lregprc | 0.8205 1.0000 


h. In fact, the Tobit model with prices in level form, rather than logarithms, fits a bit better 
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(log-likelihood = —1, 263.37 versus —1, 265.71). 


tobit ecolbs ecopre regprc lfaminc educ hhsize num5_17, 11(0) 


Tobit regression Number of obs = 660 
LR chi2(6) = 55.47 

Prob > chi2 = 0.0000 

Log likelihood = -1263.3702 Pseudo R2 0.0215 
ecolbs | Coef Std. Err t P>|t | [95% Conf. Interval 

See at a eat a pa, ee et G +--------------------------------------------------------------- 
ecoprc | -5.649516 .887358 6.37 0.000 -7.391931 -3.907102 
regprc | 5.575299 1.063999 5.24 0.000 3.486032 7.664566 
lfaminc | . 4195658 . 2354371 1.78 0.075 - .0427381 . 8818696 
educ | . 1002944 . 0681569 1.47 0.142 - .0335384 . 2341271 
hhsize | -0264861 . 1320183 0.20 0.841 - .2327448 .2857171 
num5_17 | . 2351291 .1963111 1.20 0.231 - .1503469 .6206051 
_cons | -1.632596 1.314633 1.24 0.215 -4.214007 . 9488146 

ee a it at i a “a +--------------------------------------------------------------- 
/sigma | 3.431504 .1262031 3.183692 3.679316 
Obs. summary: 248 left-censored observations at ecolbs<=0 


412 uncensored observations 
© right-censored observations 


17.13. This extension has no practical effect on how we estimate an unobserved effects 
Tobit or probit model, or how we estimate a variety of unobserved effects panel data models 


with conditional normal heterogeneity. We simply have 


T 
Ci =— Te) m E+x€+a;=ywt+x6+ ai, 
t=1 


where y = —-(71 2: m6). Of course, any aggregate time dummies explicitly get swept out of 
X; but they would usually be included in the equation. 

An interesting follow-up question is: What if we standardize each x; by its cross-sectional 
mean and variance at time ¢, and assume c; is related to the mean and variance of the 


—1/2 


standardized vectors? In other words, let zi, = (X# — T:)Q; ^, t = 1,..., 7, for each random 


draw i from the population, where Q, = Var(xx). Then, we might assume 


ci\x; ~ Normal (y + 2,07) 
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(where, again, z;; would not contain aggregate time dummies). This is the kind of scenario that 
is handled by Chamberlain’s more general assumption concerning the relationship between c; 
and x;: c; = w+ Ewe X;-A, + aj, where A, = QZ" ET, r = 1,2,...,T7. Alternatively, one could 
estimate m; and Q; for each ¢ using the cross section observations <x; : i = 1,2,...,N}. The 
usual sample means and sample variance matrices, say 7; and om are consistent and 
J/N -asymptotically normal. Then, form ĉ; = (xi: — NÂ”, and proceed with the usual Tobit 
(or probit) unobserved effects analysis that includes the time averages 2; = T~! ae Zit. This is 
a simple two-step estimation method, but accounting for the sample variation in &, and Ô, 
analytically would be cumbersome. The panel bootstrap is an attractive alternative. Or, it may 
be possible to use a much larger sample to obtain 7, and Ô, in which case one might ignore 
the sampling error in the first-stage estimates. 
17.14. a. Because heteroskedasticity is only in the distribution of a; given x;, the density of 
Vir given (x;,a;) is the same as that implied by (17.75) and (17.76), namely, 
yilXi,a; ~ Tobit(y + xB + Xé + ai, 02). 
b. Let fGv:|x;, ann) denote the Tobit density of y,:|x;,a; implied by part a, where n contains 
B, v, 6, and oł. Then, under (17.78), 
T 
for. oyrbiasn) = | [Addxi.acn). 
t=1 


Therefore, to obtain (1, ...,y7|x:;®), we integrate out a;: 
ae 
JOrn ss VARGO) = f | [A@ux:, as; 1) )h(a|xi3d, 04 da, (17.91) 
OF 


where /(a|x;;4,02) denotes the normal density with mean zero and variance o2 exp(XiA). The 
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log-likelihood is obtained by plugging the y;; into (17.91) and taking the log. 
c. The starting point is still equation (17.79), but the calculation of 
E[m(w + x,B + Xé + ai,02)|x;] is complicated by the heteroskedasticity in Var(a;|x;). 


Nevertheless, essentially the same argument used on page 542 shows that 
E[m(w + xB + Xé + ai,02)|x;] = miy + xB +t, 02 exp(X/A) + 07]. 


Given the MLEs, we can estimate the APEs from the average structural function: 
[nnn N 
ASF(x:) = N+ $ ml + xh + zÊ ô2exp@ A) + 62]; 
i=1 
we would compute changes or derivatives with respect to the elements of x;. Incidentally, if we 
drop assumption (17.78), we could use a pooled heteroskedastic Tobit procedure, and still 


consistently estimate the APEs. 


17.15. a. The Stata output is given below. The value of the log-likelihood is —17, 599.96. 
use cps91 


tobit hours nwifeinc educ exper expersq age kidlt6 kidge6, 11(0) 


Tobit regression Number of obs = 5634 
LR chi2(7) = 645.55 

Prob > chi2 = 0.0000 
Log likelihood = -17599.958 Pseudo R2 = 0.0180 

hours | Coef Std. Err t P>|t | [95% Conf. Interval 

ee ge an fe ae te +--------------------------------------------------------------- 
nwifeinc | -.2444726 .0165886 -14.74 0.000 -.2769926 -.2119525 
educ | -6.064707 22.73817 -0.27 0.790 -50.64029 38.51087 
exper | -8.234015 22.74967 -0.36 0.717 -52.83214 36.36411 
expersq | -.0178206 .0041379 -4.31 0.000 - .0259325 - .0097087 
age | 8.53901 22.73703 0.38 0.707 -36.03435 53.11237 
kidlt6 | -14.0809 1.21084 -11.63 0.000 -16.45461 -11.70719 
kidge6 | -1.593786 1.09917 -1.45 0.147 -3.748583 .5610116 
_cons | -56.32579 136.3411 -0.41 0.680 -323.6069 210.9553 

ee a ee ee E +--------------------------------------------------------------- 
/sigma | 28.90194 .3998526 28.11807 29.6858 

Obs. summary: 2348 left-censored observations at hours<=0 
3286 uncensored observations 


© right-censored observations 
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b. The lognormal hurdle model — which has eight more parameters than the Tobit model — 


does fit better in this application. The log likelihood — which properly account for the fact that 


the linear regression for log(hours;) is to be viewed as MLE for D(hours|x, hours > 0) — is 


about —16, 987.50. The contribution of the probit is about —3, 538.41 and the contribution of 


the lognormal distribution conditional on hours > 0 is —13, 449.09. The log likelihood for the 


Tobit is —17, 599. 96. 


probit inlf nwifeinc educ exper expersq age kidlt6 kidge6 


Probit regression 


Log likelihood = -3538.4086 


Number of obs 


nwifeinc 
educ 
exper 
expersq 


| -.0091475 
| -.0626136 
|  -.157161 
| -.0005574 
|  .1631286 
| -.4810832 
|  .0409155 
| -1.489209 


.0006759 
. 9045369 
. 9050879 
.0001713 
. 9044966 

.051688 
.0471194 
5.422855 


- .0078227 
1.710246 
1.616779 

- .0002217 
1.935909 

-.3797767 
. 1332678 
9.139393 


gen lhours 


= log(hours) 


(2348 missing values generated) 


glm lhours nwifeinc educ exper expersq age kidlt6 kidge6 


Iteration 0: 


Generalized linear 


Optimization 


Deviance 
Pearson 


Variance function: 
Link function 


Log likelihood = 


LR chi2(7) 

Prob > chi2 = 

Pseudo R2 = 
P>|z| 
0.000 - .0104722 
0.945 -1.835473 
0.862 -1.931101 
0.001 - .000893 
0.857 -1.609652 
0.000 - .5823897 
0.385 - .0514367 
0.784 -12.11781 

No. of obs 


Residual df 


Scale parameter 
(1/df) Deviance 


(1/df) Pearson 


[Gaussian] 
[Identity] 


1928961 


1.194705 


= -25911.05 


log likelihood = -1954.9002 

models 
ML 

= 632.3133243 

= 632.3133243 
V(u) = 
g(u) = 
-1954.900228 

OIM 
| Coef Std. Err. 
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Z 


nwifeinc | -.0018706 . 0003327 -5.62 0.000 - .0025227 - .0012185 
educ | -.2022625 . 4403476 -0.46 0.646 -1.065328 . 660803 
exper | -.2074679 . 4405366 -0.47 0.638 -1.070904 .655968 
expersq | -.0001549 .0000812 -1.91 0.057 - .000314 4.33e-06 
age | 2112264 4403162 0.48 0.631 -.6517775 1.07423 
kidlt6 | -.1944414 0222299 -8.75 0.000 - .2380113 -.1508715 
kidge6 | -.1256763 0199962 -6.29 0.000 -.1648681 - .0864845 
_cons | 2.252439 2.640541 0.85 0.394 -2.922926 7.427805 
sum lhours 
Variable | Obs Mean Std. Dev. Min Max 
ee i a re +-------------------------------------------------------- 
lhours | 3286 3.497927 .4467688 (0) 4.787492 


. di 3286*r(mean) 
11494.189 


. di -1954.9002 - 11494.189 
-13449 . 089 


. di -3538.4086 - 13449.089 
-16987 . 498 


c. The ET2T model is given below. Again, to properly compare its log likelihood, we must 
subtract Bee log(Aours;) to obtain the final log likelihood. As must be the case, the ET2T 


model fits better than the lognormal hurdle model, but the improvement is very slight. In fact, 
the estimate of p is very small — about .018 — and not statistically different from zero. The 
likelihood ratio statistic gives the same result, producing p-value = .862. Fortunately the 
estimated coefficients are very similar across the two approaches, as we would hope with p so 
close to zero. 

These findings are very different from what we found using the data in MROZ.RAW - see 
Table 17.2. There, the estimate of p is an implausible —. 972. Without an exclusion restriction 
(in either application) it is hard to be confident of the results. But with the current data set, we 
are led to the lognormal hurdle model with all explanatory variables in the selection and 


amount equations. 


heckman lhours nwifeinc educ exper expersq age kidlt6 kidge6, 
select(inlf = nwifeinc educ exper expersq age kidlt6 kidge6) 
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Heckman selection model Number of obs = 5634 
(regression model with sample selection) Censored obs = 2348 
Uncensored obs 3286 
Wald chi2(7) = 93.35 
Log likelihood = -5493.294 Prob > chi2 0.0000 
| Coef Std. Err Z P>|z | [95% Conf. Interval 
ja St ome pk, ath Sy a +--------------------------------------------------------------- 
lhours | 
nwifeinc | -.001911 . 0003968 -4.82 0.000 - .0026886 - .0011333 
educ | -.2023362 . 4398277 -0.46 0.645 -1.064383 .6597103 
exper | -.2079298 . 4400233 -0.47 0.637 -1.07036 .6545001 
expersq | -.0001578 . 0000827 -1.91 0.056 - .0003198 4.21e-06 
age | .2117355 . 4398047 0.48 0.630 - .6502658 1.073737 
kidlt6 | -.1964925 .0247921 -7.93 0.000 - .245084 -.1479009 
kidge6 | -.1255154 .0199914 -6.28 0.000 -.1646978 - .086333 
_cons | 2.240756 2.63817 0.85 0.396 -2.929962 7.411474 
eas “ie it. lt hl a" “ST a +--------------------------------------------------------------- 
inlf | 
nwifeinc | -.0091462 . 000676 -13.53 0.000 - .0104711 - .0078214 
educ | -.0628333 . 9045172 -0.07 0.945 -1.835654 1.709988 
exper | -.1574369 . 9050687 -0.17 0.862 -1.931339 1.616465 
expersq | -.0005568 .0001713 -3.25 0.001 - .0008925 - .0002211 
age | . 1633826 9044773 0.18 0.857 -1.60936 1.936125 
kidlt6 | -.4810173 0516912 -9.31 0.000 - .5823302 - .3797043 
kidge6 | .0410785 0471309 0.87 0.383 - .0512964 1334534 
cons | -1.491111 5.422743 -0.27 0.783 -12.11949 9.13727 
/athrho | .0178479 .0959507 0.19 0.852 -.1702121 . 2059078 
/lnsigma | -.8239347 .0123732 -66.59 0.000 - .8481857 - . 7996837 
rho .017846 .0959202 - .1685871 . 2030463 
sigma | .4387021 .0054281 .4281911 .4494711 
lambda | -0078291 .0420881 - .074662 . 0903202 
LR test of indep. eqns. (rho = 0): chi2(1) = 0.03 Prob > chi2 = 0.8615 


. di -5493.294 - 11494.189 
-16987 . 483 


di 2*(16987.498 - 16987 .483) 
.03 


d. The estimates for the amount part of the truncated normal hurdel model are given below. 
Because the participation equation is still the probit model we estimated earlier, we can 
compare the log likelihood for the truncated normal regression to that from the lognormal 
estimation in part b. The former is —12, 445. 76 and we already computed the latter as 


—13, 449.09. Thus, in this example the TNH model fits substantially better than the LH model. 
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The full log likelihood for the TNH model is —15, 984. 17 and this is much larger than that for 


the Tobit (a special case), —17, 599. 96. 


truncreg hours nwifeinc educ exper expersq age kidlt6 kidge6, 11(0) 


(note: 2348 obs. truncated) 


Fitting full model: 


Limit: 
upper 


Log likelihood = 


lower = 


0 


+inf 
-12445.76 


Number of obs = 
Wald chi2(7) 
Prob > chi2 


hours | Coef Std. Err Zz P>|z | [95% Conf. Interval 

a ee pga erg ce ae +--------------------------------------------------------------- 
nwifeinc | -.0439736 .0081584 -5.39 0.000 - .0599638 - .0279835 
educ | -9.183178 11.12374 -0.83 0.409 -30.98531 12.61896 
exper | -9.426741 11.12822 -0.85 0.397 -31.23765 12.38417 
expersq | -.0024584 0019888 -1.24 0.216 - .0063564 .0014396 
age | 9.470886 11.12299 0.85 0.395 -12.32978 31.27155 
kidlt6 | -4.779305 5444546 -8.78 0.000 -5.846417 -3.712194 
kidge6 | -3.370223 4896076 -6.88 0.000 -4.329837 -2.41061 
_cons | -21.34309 66.70579 -0.32 0.749 -152.084 109.3979 

ee ee E e +--------------------------------------------------------------- 
/sigma | 10.72244 . 1347352 79.58 0.000 10.45836 10.98651 


. di -3538.4086 - 12445.76 
-15984.169 


17.16. a. Write c; = y + X;ģ + a; and substitute to get 


yi = XuB + y + Xi + ai + Ui. 
Now, conditional on (x;,a;), vir follows a standard two-limit Tobit model. Therefore, the 


density is 


fA dxi,. 47559) = [P((q1 Xup y xiE a;)/o,) P5 
s Lon oly: xirB ty X; H ai)/O u} t1312] 
-(O((-q2 + XB + y+ X64 a;)/o,) |] bev 


Byt the conditional independence assumption, the joint density of (vii, Vi2,..., vir) conditional 


on (x;,a;) is 
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T 
[ foils: assy). 
t=1 
Now we integrate out a; to get the joint density of (vi1,yi2,..., vir) given x;: 
foe) T. 
f oxen Jogaonaa 
=| t=1 


Now the log likelihood for a random draw i is just 
a Mi 
0;(0) = log i | Torsas y) fosaonaa} 
=9 l 1 


where 9 is the vector of all parameters, including o2. As usual, we sum across all i to get the 
log likelihood for the entire cross section. 

b. This is no different from any of the other CRE models that we have covered. We can 
easily find E(c;) and Var(c;) from c; = y + ¥;€ + a; because E(a;|x;) = 0 and Var(a;|x;) = 02. 
In fact, 

E(c;) = E(w + ¥i§) = y + EGE 


and so a consistent estimator of E(c;) is 
N 
jc = 4 G bs 
i=l 


where w and Ê are the MLEs. 


Next, 


Var(c;) = Var(y + ¥€) + Var(a;) 
= E'Var(X))E+ 02, 


where we use the fact that a; and X; are uncorrelated. So a consistent estimator is 
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N 
G2 = E 2 aD En k +63 


where ¥ = NIY“ &; 


er 
c. We can get the average structural function by slightly modifying equation (17.66). First, 
the conditional mean is 
EQvilXit, ci) = qil[®((qi — Xup - c1)/ou)] 


[B((—g2 + xiB + c1)/Ou) — O((q1 — XP — ci)/0u)] + B(G1,92, XB + Ci,07) 
+ q2[O((-g2 + XB + ci)/ou)] 


where 


{ol (g2 = 2)/Oul — O1(G1 — 2)/oul} 


z,o?) =2 i 
8(91,92,2,0°) =Z + £@[(q2 - z)/ou] — D[(q1 -2)/0 4] 


The ASF is obtained as a function of x; by averaging out c;. But we can use iterated 


expectations, as usual, by first conditioning on x; and then averaging out X;: 
E.,[m(x:,¢i)] = Es Elna ci) 


where m(x, c) = E(YulXi = Xn ci = c). Using the same argument from the one-limit CRE 


Tobit model, 


A(x, Xi) = Elman c)] = qil[®(qi -xp — y — ¥:8)/oy)] 

[O((Cq2 + xP + y+XE)/or)) -Oqi -xP -y —X:€)/oy)] 
*2(71,92,X:B + y + X05) 

+ q2[O((-q2 + xB + y + Xi§)/oy)] 


where c? = 02 + o2. The ASF is consistently estimated as 


N 
ASPA) = N Y A(x, 1) 


i=1 
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where /(+, +) denotes plugging in the MLEs of all estimates. Now take derivatives and changes 
with respect to x;. 

d. Without assumption (17.78), we can just use a pooled two-limit Tobit analysis to 
estimate B, y, €, and o2. As in the standard Tobit case, we cannot separately estimate oĉ and 
o2. But the APEs are still identified as they depend only on oł, as shown in part c. 


17.17. a. Plug in the expressions for c; and cz to get 


Yin = Max(0,a1yi2 + Zind1 + y1 4 26, + ai + Uin) 


Vi = Zitz + W2+ ZG, + ai2 + via 
or 


yin = max(0,aiyig + Zinb1 + Yi + ZS, + vin) 


Vin = Le. +W2+Zi§, + Vie 


where vy = añ + uin and vin = an + uin. Given the assumptions on D(win, Uin|Zi, ai) and 
D(ai1, ai2|z;), it follows that Din, viz|zi) =D(vin, Viz) is bivariate normal with mean zero. 


Therefore, we can write 


Vin = P1Vin + ein 
D(ein|zi,vi2) = E(ein) = Normal(0,03,). 


It follows we can write 


yin = max(0, ain + Zind1 + Y1 + ZG, + Pivia + ein) 


D(ein|zi,Vi2, Vio) = Normal(0,o2,) 


and now a pooled two-step method is immediate. First, obtain the residuals 2 from the pooled 


regression 


Yin ON Zit, 1, Z;, t = Leo a 1,...,N. 
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Then use pooled Tobit of 
Yin ON Via, Zin, 1, Zi, Viz 
to estimate a1, 81, wi, &,, P1, and o2,. 

Incidentally, the statement of the problem said that <y; will not be strictly exogenous in 
the estimable equation. While that is true of the previously proposed solution, in other 
approaches {yj} can be rendered strictly exogenous. Here is one possibility. Let 
Vi2 = (Va2,..., Vit2) be the entire history on the reduced form errors. Then, given the previous 
assumptions, D(vin|Z;, Vi2) =D(vin|Vi2). Because vin = ai + uin, it is reasonable to assume a 


Chamberlain-Mundlak representation, for example, 
Via = PiVin + Y1Vi2 + ein 


where now e;n is independent of (zi, vi2) and therefore of (z;, Viz, y;.), where 


Yp = Waz,...,Vir2). This means that in the equation 
Yin = Max(0,a1Vie + Zin81 + Wi + Zi% + P1Vin + Y1V2 + ein), 


Vir, Zir, Vin |: r = 1,..., Ty is strictly exogenous with respect to ein. The CF approach changes 
in that we add ¥ 2 as an additional explanatory variable (along with +;2) in using pooled Tobit. 
Because of strict exogeneity, approaches that attempt to exploit the serial dependence in the 
scores are now possible. 

b. As usual, the two-step nature of the estimation needs to be accounted for by using either 
the delta method or the panel bootstrap. In using the delta method, the serial dependence in the 
scores should be accounted for. It is automatically accounted for with the panel bootstrap 
because the cross section units are resampled. 


c. We have used this approach several times. Let m(z,o7) denote the unconditional mean 
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function for the standard Tobit model. Then 


ASF(va,Za) = Ego [Maye + 2081 + yi + 2:6, + P1vie,o2)] 


and so a consistent estimator is 


Acp 1 ~ ^ N a = aa AD 
ASF(v2,Z0) = N- Yi mae +Zn01 + Wit Zi% + P1Pin, 041). 


i=1 
As usual, the estimated APEs are obtained by taking derivatives or changes with respect to 


Yo sZ ) : 

17.18. a. Once we assume z is exogenous in the structural equation — and E(u1|z) = 0 
ensures exogeneity — then we only need the rank condition. The assumption that E(z’z) is 
nonsingular is not usually restrictive. The important condition with a single endogenous 


explanatory variable is 


L(2|z) + LQz|z1), 
so there is at least one element of z not in z; that explains variation in y2. 
b. We can draw on the optimal instrument variables results from Section 8.6. The condition 
E(u,|z) = 0 ensures that any function of z is a valid instrumental variable candidate, and also 
implies that E(u7|z) = Var(u1ı|z). Because E(u?|z) is constant, from Theorem 8.5 the optimal 


IVs are 


[E(v2|z), z1)]. 
If we think D(y2|z) follows a standard Tobit then we should obtain E(y2|z) from the Tobit 


model. Recall that if 
D(y2|z) = Tobit(z52, 75) 


then 
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E(v2|z) = @(282/T2)Z82 F T29(Z62/T2) 


Therefore, if we run Tobit in a first stage, we get 


A 


2 = E(valz;) = O(2ib2/t2)zid2 + t26(2i82/72) 
and then use IVs (Mn, Zn) in the equation 
Ya = yn +2161 + un 
by IV. This approach just identifies the parameters. We can get overidentification (if we have 
enough elements of z;) by using all of z; in place of zj1. 
Provided we maintain E(w,|z) = 0, using Tobit fitted values as instruments is no less (and 
no more) robust than using 2SLS. As mentioned previously, any function of z; is valid as a 


potential instrument. Even if the Tobit model is incorrect, we know the quasi-MLEs converge 


very generally. Call the plims 65 and tž and define 
Mig = P(z:03/T3)Z:ð3 + T3Ø(z:03/T3), 
which is just a function of z;. Ruling out perfect collinearity in (mj, Z), the rank condition is 
LQv2|mjx,Z1) + LQialzir), 
which simply means that m% should have some partial correlation with z;;, something we 
would expect quite generally if Z» is partially correlated with yj. 
Using Mn as an instrument for y» is preferred to using it as a regressor in place of mî. If 
we use ñp aS a regressor then we are effectively assuming 
E(v2|z) = ®(262/t2)zZ62 + T26(Z62/T2) (and that we have consistent estimators of the 
parameters in this mean). Generally, the estimates of 6; and a; would be inconsistent of the 


Tobit model for y2 is misspecified. When m2 as an instrument, the reduced implicit reduced 
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form for the IV estimation is 
LQv2|mj2,Z1) = yəm} + Zin, 
and we do not need y = 1 andy, = 0, as the plug-in-regressor method essentially does. 


c. We can write 


yı = 71614 Q1y2 + piv2+ e1 


E(e1|zZ,y2,v2) = 0 
It follows that 
E(vi|z,y2) = 2181 + aiy2 + piE(v2|z,y2) 


and we can compute E(v2|z, y2) given that D(v2|z) = Tobit(z52,73). In fact, as shown in Vella 


(1993, International Economic Review), 


E(volz,y2) = 12 > Olvs —1[y2 = O}ra] - a | 
= 1[y2 > O]v2 — 1[y2 = O]t2A(—282/T2) 
where A(-) = (-)/@(-) is the inverse Mills ratio. (This is an example of a generalized 
residual.) 
Given the Tobit MLEs, we can easily construct 
E(valziy2) = Uva > 0a- Iya = O]@2A(-2ib2/t2) 
in a first stage, and then in a second stage run the OLS regression 
ya On Za, Ya, 1ye > Olea —- Iya = Ojt2A(-zid2/t2) 
to consistently estimate 6), a1, and p1. 
Because the CF approach is based on E(y1|z, y2), nothing important changes if we start 


with 
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yı = gı (zı, y2)ß; +u. 


The same reasoning as before gets us to 


E(yı|z, y2) = &,(Z1,.¥2)B, + p141 > O]v2 - 1[y2 = 0]t24(-z82/12)} 
and so adding the same CF as before works for consistently estimating B,. Of course, the 
interpretation of B} depends on the nature of the functions in g; (z1, y2). 

d. The 2SLS estimator that effectively ignores the nature of y2 is simple and fully robust — 
assuming we have at least one valid instrument for y2. Standard errors (robust to 
heteroskedasticity) are easy to obtain. Its primary drawback is that it may be (asymptotically) 
inefficient compared with the other methods. An additional shortcoming is that if we use 
general functions g,(z1,2) we need to decide on instruments for any function that includes y2. 
[Remember we are generally not allowed to plug in a fitted value to obtain g; (zi, ¥i2) and then 
regress yi On g; (Za, 2).] 

The method of using the Tobit fitted value as the IV for y2 is just as robust as 2SLS 
estimator yet it exploits the corner solution nature of y2. It need not be more (asymptotically) 
efficient than 2SLS, but it could be even if the Tobit model for y2 is misspecified [or Var(u1|z) 
is homoskedastic, or both]. That we have estimated the instruments in a first stage can be 
ignored in the yN -asymptotic distribution of the IV estimator. Like the 2SLS estimator, 
having general functions g,(z1,v2) means we would have to obtain IVs for all endogenous 
functions. This is almost always possible but is not always obvious. 

The CF method is simple to compute but the standard errors generally have to account for 
the two-step estimation unless p; = 0. (The CF method provides a simple test of the null that 


y2 is endogenous: just use a heteroskedasticity-robust ¢ statistic for 61.) Another drawback to 
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the CF method is that it is derived assuming the Tobit model for y2 holds. Generally, it is 
inconsistent if the Tobit model fails (just like using Tobit fitted values as regressors rather than 
instruments). An advantage of the CF method is that, as discussed in part c, it is easily applied 
for general functions g,(z1,2). In such cases, the CF method is likely to be more efficient 
asymptotically than 2SLS or the IV method described in part b. 

e. If we assume joint normality of (u1, v2) (and independence from z) then MLE becomes 
attractive. It will give the asymptotically efficient estimators and there is no two-step 
estimation issue to deal with (as in the CF case). The log likelihood is a bit tricky to obtain 
because it depends on D(y1ly2, z). We already know that D(y2|z) follows a Tobit. We also 
know D(y1|v2,z) follows a classical linear regression model with mean z181 + a@1y2 + pıv2 and 
variance o2,. For y2 > 0, DOvi|v2,z) =D(v1|v2,z). For y2 = 0, we have to integrate over 


v2 < —2Z62, just like in Problem 17.6. When we have to two densities, we use, for each i, 


logi Yalaz, Zi; 8) + log[fo(vi2|z:3 82, 75) 
as the log likelihood. 


17.19. a. Because of the conditional independence assumption we have 
G 
fos 06k&GY,) = [fen ey) 
g=1 


for dummy arguments (),,.-.,¢). 


b. To obtain the density of (y1, ..., yc) given x we integrate out c: 
G 
g9» Baay NIX Y,»5o) = f| ros c; y£) þes 6, )de 
g=1 


where, in general, the integral is a multiple integral. Also, we have indicated ¢ as a continuous 


random vector but it need not be. 
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c. The log likelihood for a random draw 7 is simply 
G 
0,(@) = log Í L oy) Jers: 
g=1 


where 0 contains all parameters. 

17.20. a. The Stata output is given below. The signs of the coefficients are generally what 
we expect: lagged hours has a positive coefficient, as does initial hours in 1980. Thus, 
unobserved heterogeneity that positively affects hours worked in 1980 also positively affects 
hours contemporaneously. The variables nwifeinc, chO_2, and ch3_5 all have negative and 
statistically significant coefficients. The one slight puzzle is that the older children variable has 


a positive and just statistically significant coefficient. 


use \mitbooki_2e\statafiles\psid80_92, clear 
tsset id year 

* Lagged dependent variable: 

bysort id (year): gen hours_1 = L.hours 

* Put initial condition in years 81-92: 

by id: gen hours80 = hours[1] 

* Create exogenous variables for years 81-92: 
forv i=81/92 { 

by id: gen nwifeinc’i’ = nwifeinc[‘i’ -80] 


forv i=81/92 { 
by id: gen ch0_2_‘i’ 


cho_2[/i’-80] 


} 
forv i=81/92 { 
by id: gen ch3_5_‘i’ 


ch3_5[/i’-80] 


forv i=81/92 { 
by id: gen ch6_17_‘i’ = ch6_17[‘i’-80] 
} 


forv i=81/92 { 
by id: gen marr’i’ = marr[‘i’-80] 


xttobit hours hours_1 hours80 nwifeinc nwifeinc81-nwifeinc92 
ch0_2 ch0_2_81-ch0_2_ 92 ch3_5 ch3_5_81-ch3_5 92 
ch6_17 ch6_17_81-ch6_17_92 marr marr81-marr92 y82-y92, 11(0) re 


note: marr86 omitted because of collinearity 
note: marr89 omitted because of collinearity 
note: marr90 omitted because of collinearity 
note: marr91 omitted because of collinearity 
note: marr92 omitted because of collinearity 
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Random-effects tobit regression 


Group variable: 


Random effects 


Log likelihood 


id 


u_i ~Gaussian 


-62882.574 


Number of obs 
Number of groups 


Obs per group: min 


avg = 
max = 


Wald chi2(73) 


10776 
898 


12 
12. 
12 


7997.27 
0.0000 


hours_1 
hourss0 
nwifeinc 
nwifeinc81 
nwifeincs2 
nwifeincs3 
nwifeinc84 
nwifeincs5 
nwifeincs6 
nwifeinc87 
nwifeincss 
nwifeincs9 
nwifeinc90 
nwifeinc91 
nwifeinc92 
cho_2 
ch0_2 81 
ch0_2 82 
ch0_2 83 
ch0_2 84 
ch0_2 85 
ch0_2 86 
ch0_2 87 
ch0_2 88 
ch0_2 89 
ch0_2 90 
ch0_2 91 
ch0_2 92 
ch3_5 
ch3_5_ 81 
ch3_5_ 82 
ch3_5_ 83 
ch3_5_ 84 
ch3_5_ 85 
ch3_5_ 86 
ch3_5_ 87 
ch3_5_88 
ch3_5_ 89 
ch3_5_90 
ch3_5_ 91 
ch3_5_ 92 
ch6_17 
ch6_17_81 
ch6_17_82 
ch6_17_83 


. 7292676 
. 2943114 
-1.286033 
.6329715 
. 1812886 
- .6567582 
- .9568491 
-1.169828 
. 437133 
-2.53217 
- .8224415 
1.325135 
.0811052 
1.550942 
- .6307469 
-146.0974 


-80.13216 
39 . 44899 
102.3494 
-38.86165 
-101.8966 
-4.967801 
-25.96859 

5.59682 
46.38591 
-95.69263 
43.70922 
147.7391 
-166.5773 
22.18895 

4.64258 
64.27872 
-66.82245 


.0119746 
.0181831 
. 3221528 
1.08601 
.914677 
.822493 
. 344172 
. 186202 
. 142004 
.067478 
. 6884551 

. 792212 
. 5898146 

.861745 
. 71718347 
21.04471 
93.39678 
97.13967 


BEEBE BB 


57.25136 
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Prob > chi2 

P>|z| [95% Conf. 
0.000 . 1057978 
0.000 . 2586731 
0.000 -1.917441 
0.560 -1.49557 
0.925 -3.57141 
0.719 -4.228778 
0.477 -3.591379 
0.324 -3.494742 
0.702 -1.801153 
0.018 -4.624388 
0.232 -2.171789 
0.094 -.2275717 
0.891 -1.07491 
0.072 - . 1380467 
0.417 -2.155275 
0.000 -187 . 3443 
0.068 -12.33577 
0.358 -101.0535 
0.393 -111.6281 
0.494 -265.3612 
0.908 -200.6478 
0.442 -129.386 
0.483 -295.2784 
0.554 -362.1352 
0.820 -365.528 
0.864 -182.7952 
0.201 -66.01852 
0.569 -213.8578 
0.000 -115.1221 
0.526 -82.48514 
0.158 -39.766 
0.600 -184.0828 
0.279 -286.5304 
0.960 -199.5553 
0.796 -223.279 
0.955 -187.2517 
0.621 -137 . 4644 
0.460 -349.7709 
0.736 -209.9579 
0.303 -133.5105 
0.438 -587.1694 
0.028 2.421976 
0.902 -69.21744 
0.244 -43.79223 
0.243 -179.033 


. 7527375 

. 3299496 
- .6546256 
2.761513 
3.933987 
2.915262 
1.677681 
1.155085 
2 


.893781 
-104.8506 
353.7729 

279.727 
284.0928 
128.1699 
178.2933 
296.4588 
139.6352 
194.0681 
461.8385 
217.7811 

313.534 
117.5092 
-45.14226 
161.3831 
244.4647 
106.3595 
82.73714 
.6197 
. 3418 
. 4453 
. 2362 
. 3856 
.3763 
. 9886 
.0149 


Observation summary: 


2835 left-censored observations 


ch6_17_84 | 1.173452 56.00241 0.02 0.983 -108.5893 110.9362 

ch6_17_85 | 6.738214 54.27217 0.12 0.901 -99 . 63328 113.1097 

ch6_17_86 | 85.64549 57.28103 1.50 0.135 -26.62327 197.9142 

ch6_17_87 | -65.96152 62.6244 -1.05 0.292 -188.7031 56.78006 

ch6_17_88 | 19.1112 56.21565 0.34 0.734 -91.06945 129.2918 

ch6_17_89 | 4.85883 61.37184 0.08 0.937 -115.4278 125.1454 

ch6_17_90 | 16.18911 60.09357 0.27 0.788 -101.5921 133.9703 

ch6_17_91 | -21.25498 55.55783 -0.38 0.702 -130.1463 87 .63636 

ch6_17_92 | -7.632119 53.88032 -0.14 0.887 -113.2356 97.97137 

marr | -199.1315 144.719 -1.38 0.169 -482.7755 84.51247 

marr81 | 127.5178 356.477 0.36 0.721 -571.1642 826.1998 

marr82 | -13.59679 491.133 -0.03 0.978 -976.1997 949.0062 

marr83 | -507.5586 434.9469 -1.17 0.243 -1360.039 344.9217 

marr84 | 1318.284 564.6247 2.33 0.020 211.6404 2424.928 

marr85 | -326.1983 585.7084 -0.56 0.578 -1474.166 821.769 
marr86 | (omitted) 

marr87 | 131.824 331.72 0.40 0.691 -518.3353 781.9832 

marr88 | -491.7295 306.7196 -1.60 0.109 -1092.889 109.4299 
marr89 | (omitted) 
marr90 | (omitted) 
marr91 | (omitted) 
marr92 | (omitted) 

y82 | -32.79071 25.88734 -1.27 0.205 -83.52896 17.94755 

y83 | 20.40184 25.84829 0.79 0.430 -30.25988 71.06355 

y84 | 105.7757 25.7722 4.10 0.000 55.2631 156.2883 

y85 | 26.36698 25.95325 1.02 0.310 -24.50046 77.23441 

y86 | 26.82807 25.99402 1.03 0.302 -24.11928 77.11542 

y87 | -.1477861 26.16878 -0.01 0.995 -51.43764 51.14207 

y88 | 21.84475 26.28302 0.83 0.406 -29.66903 73.35853 

y89 | 33.76287 26.39745 1.28 0.201 -17.97518 85.50092 

y90 | 30.54594 26.52445 1.15 0.249 -21.44102 82.5329 

y91 | 29.17601 26.64107 1.10 0.273 -23.03953 81.39155 

y92 | -27.66915 26.97277 -1.03 0.305 -80.53481 25.19651 

cons | -165.6397 47 .85094 -3.46 0.001 -259.4258 -71.85356 

/sigma_u | 310.4876 12.44431 24.95 0.000 286.0972 334.878 

/sigma_e | 508.4561 4.327479 117.49 0.000 499.9744 516.9378 

rho | . 2716099 .0164159 . 2404141 . 3046996 


7941 uncensored observations 
© right-censored observations 


b. The Stata commands below produce the scale factor for the APE of a continuous 
explanatory variable, evaluated at hours: = 0. All other variables are averaged out, and the 
scale factor is for 1992. The APE for nwifeinc in 1992 is about —. 742. Because nwifeinc is in 
$1,000s, the coefficient implies that a $10,000 increase in other sources of income decreases 


estimated annual hours by about 7.4. This is a small economic effect given that the average 
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hours in 1992 is about 1,155, and a $10,000 increase is fairly large. 


. predict xbh, xb 
(898 missing values generated) 


. gen xbh_hO = xbh - _b[hours_1]*hours_1 
898 missing values generate 
issi 1 d 


. gen scale = normal(xbh_hO/sqrt(_b[/sigma_u]42 + _b[/sigma_e]‘2) ) 
(898 missing values generated) 


. sum scale if y92 


Variable | Obs Mean Std. Dev. Min Max 
i is ec Ha +-------------------------------------------------------- 


scale | 898 .5769774 .1692471 .0649065 . 9402357 


. di .5769774*_b[nwifeinc] 
- . 74201219 


. sum hours nwifeinc if y92 


Variable | Obs Mean Std. Dev. Min Max 

Ht Sa + a ee ii Se a +-------------------------------------------------------- 
hours | 898 1155.318 899.656 0 3916 
nwifeinc | 898 43.57829 44.2727 -7.249999 601.504 


c. Because ch0_2 is a discrete variable, we compute the difference in the conditional mean 
function and then average. The APE in 1992 in moving from zero to one small children is 


about —116. 47, which means average annual hours fall by about 116.5 hours. 


. gen xbh_cO = xbh - _b[ch0_2]*ch0_2 
(898 missing values generated) 


. gen xbh_c1 = xbh_cO + _b[ch0_2] 
(898 missing values generated) 


. gen mean = normal(xbh_c0/sqrt(_b[/sigma_u]^2 + _b[/sigma_e]42))*xbh_cO + sqrt 
> en(xbh_c0/sqrt(_b[/sigma_u]^2 + _b[/sigma_e]^2)) 
(898 missing values generated) 


. gen meant = normal(xbh_c1/sqrt(_b[/sigma_u]^2 + _b[/sigma_e]^2))*xbh_c1 + sqrt 
> en(xbh_ci/sqrt(_b[/sigma_u]42 + _b[/sigma_e]‘2) ) 
(898 missing values generated) 


. gen diff = meani - meando 
898 missing values generated 
g g 


sum diff if y92 


Variable | Obs Mean Std. Dev. Min Max 
ae a ie) a a te. “hag nh E +-------------------------------------------------------- 
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diff | 898 -116.4689 38.50869 -146.0974 -7.479618 
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Solutions to Chapter 18 Problems 

18.1. a. This is a simple problem in univariate calculus. Write g(u) = Uo log(u) — u for 
u > 0. Then dg(u)/du = Ho/u — 1, SO u = Ho uniquely sets the derivative to zero. The second 
derivative of g() is -uou < 0 for all u > 0, so the sufficient second order condition for a 
maximum is satisfied. 

b. For the exponential case, g(u) = E[;(u)] = —uo/u — log(u). The first order condition is 
Lopt* — p+ = 0, which is uniquely solved by u = uo. The second derivative is —2u ou? + u°, 
which, when evaluated at uo, gives —2 u43? + 5? = -u72 < 0. 

18.2. When m(x, B) = exp(xB), we have s;(B) = exp(x,§)x‘di,/exp(xiB) = x/di;, where 


úi = yi— exp(x;f). Further, the Hessian H;(f) does not depend on y;, and 
A A A 2 A A 
A.(B) = -Hi(B) = [exp(x:B) | x;x:/exp(xiB) = exp(x.B)x;x:. 


Therefore, we can write equation (18.14) as 


N 1/7 N N -1 
(Zera ) (Eix (Zwaan ) . 


18.3. a. The Stata output is below. Neither the price nor income variable is significant at 
any reasonable significance level, although the coefficient estimates are the expected sign. It 
does not matter whether we use the usual or robust standard errors. The two variables are 
jointly insignificant, too, using the usual and heteroskedasticity-robust tests (p-values = .490, 


.344, respectively). 
. use smoke 


. reg cigs lcigpric lincome restaurn white educ age agesq 


Source | SS df MS Number of obs = 807 
Saen S i panne enn ene nee eee F( 7, 799) = 6.38 
Model | 8029.43631 7 1147.06233 Prob > F = 0.0000 
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Residual | 143724.246 799 179.880158 R-squared = 0.0529 


------------- +------------------------------ Adj R-squared = 0.0446 
Total | 151753.683 806 188.280003 Root MSE = 13.412 

cigs | Coef Std. Err t P>|t | [95% Conf. Interval 

in i ci ns “te el +--------------------------------------------------------------- 
lcigpric | -.8509044 5.782321 -0.15 0.883 -12.20124 10.49943 
lincome | . 8690144 . 7287636 1.19 0.233 - .561503 2.299532 
restaurn | -2.865621 1.117406 -2.56 0.011 -5.059019 - .6722234 
white | -.5592363 1.459461 -0.38 0.702 -3.424067 2.305594 
educ | -.5017533 .1671677 -3.00 0.003 - .829893 -.1736135 
age | . 7745021 . 1605158 4.83 0.000 . 4594197 1.089585 
agesq | -.0090686 .0017481 -5.19 0.000 - .0124999 - .0056373 
cons | -2.682435 24.22073 -0.11 0.912 -50.22621 44.86134 


test lcigpric lincome 


( 1) lcigpric = 0 
( 2) lincome = 0 


0.71 
0.4899 


F( 2, 799) 
Prob > F 


reg cigs lcigpric lincome restaurn white educ age agesq, robust 


Linear regression Number of obs = 807 
F( 7, 799) = 9.38 

Prob > F = 0.0000 
R-squared = 0.0529 
Root MSE = 13.412 

| Robust 

cigs | Coef. Std. Err. t P>|t | [95% Conf. Interval 

sate ems“ atl, iS Sea sl” eh Meet a +--------------------------------------------------------------- 
lcigpric | -.8509044 6.054396 -0.14 0.888 -12.7353 11.0335 
lincome | . 8690144 .597972 1.45 0.147 - .3047672 2.042796 
restaurn | -2.865621 1.017275 -2.82 0.005 -4.862469 - .868774 
white | -.5592363 1.378283 -0.41 0.685 -3.26472 2.146247 
educ | -.5017533 . 1624097 -3.09 0.002 - .8205533 - .1829532 
age | . 7745021 . 1380317 5.61 0.000 . 5035545 1.04545 
agesq | -.0090686 .0014589 -6.22 0.000 - .0119324 - .0062048 
cons | -2.682435 25.90194 -0.10 0.918 -53.52632 48.16145 


test lcigpric lincome 


( 1) lcigpric = 0 
( 2) lincome = 0 


F( 2, 799) = 1.07 
Prob > F 0.3441 


b. While the price variable is still highly insignificant (p-value = .46), the income variable, 
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based on the usual Poisson standard errors, is very significant: £ = 5.11. Both estimates are 
elasticities: the estimate price elasticity is —. 106 and the estimated income elasticity is . 104. 
Incidentally, if you drop restaurn — a binary indicator for restaurant smoking restrictions at the 
state level — then /cigpric becomes much more significant (using the MLE standard errors). In 
this data set, both cigpric and restaurn vary only at the state level, and , not surprisingly, they 
are significantly correlated. (States that have restaurant smoking restrictions also have higher 


average cigarette prices, on the order of 2.9%.) 


poisson cigs lcigpric lincome restaurn white educ age agesq 


Poisson regression Number of obs = 807 
LR chi2(7) = 1068.70 
Prob > chi2 = 0.0000 
Log likelihood = -8111.519 Pseudo R2 = 0.0618 
cigs | Coef Std. Err zZ P>|z | [95% Conf. Interval 
a ee a a a r eel +--------------------------------------------------------------- 
lcigpric | -.1059607 . 1433932 -0.74 0.460 . 3870061 . 1750847 
lincome | .1037275 .0202811 5.11 0.000 .0639772 . 1434779 
restaurn | -.3636059 0312231 -11.65 0.000 -4248021 - .3024098 
white | -.0552012 .0374207 -1.48 0.140 .1285444 .0181421 
educ | -.0594225 .0042564 -13.96 0.000 .0677648 - .0510802 
age | . 1142571 .0049694 22.99 0.000 . 1045172 .1239969 
agesq | -.0013708 . 000057 -24.07 0.000 .0014825 - .0012592 
cons | . 3964494 . 6139626 0.65 0.518 . 8068952 1.599794 


c. The GLM estimate of o is about G = 4.51. This means all of the Poisson standard errors 


should be multiplied by this factor, as is done using the glm command in Stata, with the 


sca(X2) option. The ¢ statistic on /cigpric is now very small (—. 16), and that on /income falls 


to 1.13 — much more in line with the linear model ¢ statistic (1.19 with the usual standard 


errors). Clearly, using the maximum likelihood standard errors is very misleading in this 


example. With the GLM standard errors, the restaurant restriction variable, education, and the 


age variables are still significant. (There is no race effect, conditional on the other covariates.) 


glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson) 


sca(x2) 
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Generalized linear 


Optimization 


Deviance 
Pearson 


Variance function: 


Link function 


Log likelihood 


models 
ML 


14752 .46933 
16232. 70987 


V(u 


) =u 
g(u) = In(u) 


-8111.519022 


No. of obs 
Residual df 


Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


807 
799 


18 . 46367 
20.31628 


20.12272 
9404.504 


lcigpric 
lincome 
restaurn 


OIM 
Coef Std. Err Z 
.1059607 . 6463244 -0.16 
.1037275 .0914144 1.13 
. 3636059 . 1407338 -2.58 
0552011 1686685 -0.33 
0594225 0191849 -3.10 
1142571 0223989 5.10 
0013708 0002567 -5.34 
3964493 2.76735 0.14 


[Poisson] 

[Log] 

AIC 

BIC 
P>|z| 
0.870 -1.372733 
0.257 - .0754414 
0.010 - .6394391 
0.743 - . 3857854 
0.002 - .0970243 
0.000 .0703561 
0.000 - .001874 
0.886 -5.027457 


1.160812 
. 2828965 
- .0877728 


(Standard errors scaled using square root of Pearson X2-based dispersion. ) 


di sqrt(20.31628) 


4.5073584 


d. The usual LR statistic is about LR = 2 + (8125.291 — 8111.519) = 27.54, which is a 


very large value in a v3 distribution (p-value = 0). The QLR statistic divides the usual LR 


statistic by ô? = 20.32, so OLR = 1.36 (p-value ~.51). As expected, the QLR statistic shows 


that the variables are jointly insignificant, while the LR statistic shows strong statistical 


significance. 


poisson cigs restaurn white 


Iteration 0: 
Iteration 1: 
Iteration 2: 


log likelihood 
log likelihood 
log likelihood = 


Poisson regression 


Log likelihood = -8125.2906 


educ age agesq 


-8125.618 
-8125.2907 
-8125.2906 


Number of obs 
LR chi2(5) 
Prob > chi2 
Pseudo R2 


restaurn 


- . 3545336 


.0308796 


-11.48 
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- .4150564 


- .2940107 


| -. 
Ji eg 

age | ; 
| -. 
| 


0618025 
0532166 
1211174 
0014458 
7617484 


.037371 
. 0040652 
.0048175 
. 0000553 
. 1095991 


- . 1350483 
- .0611842 
.1116754 
- .0015543 
.5469381 


.0114433 
- .0452489 
. 1305594 
- .0013374 
.9765587 


di 2*(8125.291 - 8111.519) 


27.544 


di 27.54/20.32 
1.355315 


. di chi2tail(2,1. 
.50661699 


e. Using the robust standard errors does not change any conclusions; in fact, most 
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explanatory variables become slightly more significant than when we use the GLM standard 


errors. In this example, it is the adjustment by ô > 1 that makes the most difference. Having 


fully robust standard errors has no additional effect once we account for the severe 


overdispersion. 


glm cigs lcigpric lincome restaurn white educ age agesq, family(poisson) 


robust 


Generalized linear 
Optimization 


Deviance 
Pearson 


Variance function: 
Link function 


models 
ML 


14752 
16232 


V(u) 
g(u) = 


. 46933 
. 70987 


u 
ln(u) 


Log pseudolikelihood = -8111.519022 


No. of obs 
Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


807 
799 


18 . 46367 
20.31628 


20.12272 
9404.504 


lcigpric 
lincome 


. 1059607 
. 1037275 
. 3636059 
.0552011 
.0594225 
.1142571 
.0013708 
. 3964493 


Robust 


. 6681827 
.083299 
. 140366 

. 1632959 

.0192058 

.0212322 

.0002446 

2.97704 


[Poisson] 

[Log] 

AIC 

BIC 
P>|z| 
0.874 -1.415575 
0.213 - .0595355 
0.010 - .6387182 
0.735 - .3752553 
0.002 - .0970653 
0.000 .0726427 
0.000 - .0018503 
0.894 -5.438442 


1.203653 
. 2669906 
- .0884937 
. 264853 
- .0217798 
1558715 
- .0008914 
6.23134 
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f. We simply compute the turning point for the quadratic: 

Baze!(—2B age?) =, 1143/[2(.00137)] = 41.72, or at about 42 years of age. 

g. A double-hurdle model — which separates the initial decision to smoke at all from the 
decision of how much to smoke — seems like a good idea. Variables such as level of education, 
income, and age could have very different effects on the decision to smoke versus how much 
to smoke. It is certainly worth investigating. One approach is to model D(y|x,y = 1) as, say, a 
truncated Poisson distribution, and then to model P(y = O|x) as a logit or probit (with 
parameters free to vary from the truncated Poisson distribution). 

18.4. In the notation of Section 14.5.3, r(w,,8) = r(w; B) = yi — m(xi,B), and so 
Ro(xi) = —Vgm(x;,B,). Further, Q,(x;) = Var[y; — m(xi,B,)|xi] = VarQvi|x:) = o2m(xi,B,) 


under the GLM assumption. From equation (14.60), the asymptotic variance lower bound is 
E{[Ro(x;)'Qo(xi) Ro(xi)]}t = ZEV pma, B) Vem(xi,B,)/m(xi,B,)], 


which is the same asymptotic variance for the Poisson QMLE under the GLM assumption. 


18.5. a. We just use iterated expectations: 


EQvilxi) = E[E Qax; c: )|x:] = Ele: exp(xirB)|x:, c:] 
= E(ci|x:) exp(xisB) 
= exp(a + X:y) exp(xiiB) = expla + xB + Xi). 


b. We are explicitly testing Ho : y = 0, but we are maintaining full independence of c; and 
x; under Ho. We have enough assumptions to derive Var(y,|x;), the T x T conditional variance 


matrix of y, given x; under Ho. First, 


Var(vilxi) = E[Var(vilxi, ci) [xi] + Var[E(vilxi, ci) [xi] 
= E[c; exp(xiB)|x;] + Var[c; exp(xirB)|x;] 
= exp(a + xuß) + 77[exp(xiiB)]?, 
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where t? = Var(c;) and we have used E(c;|x;) = exp(a) under Hp. A similar, general 


expression holds for conditional covariances: 


Cov (yin Vilx) = E[Cov(va,yirlki,ci)|X1] + Cov[E(Valxi,c1), Elx; €7) 1X0] 
= 0 + Cov[c;exp(xi:B), c; exp(xi-B)|x;] 
= T° exp(xiB) exp(xiB). 


So, under Ho, Var(y;|x;) depends on a, B, and 7°, all of which we can estimate. It is natural to 
use a score test — actually, its variable addition counterpart — to test Ho : y = 0. First, obtain 
consistent estimators @, B by, say, pooled Poisson QMLE. Let Yu = exp(a@ + xab) and 

ŭi = Vi — Yu. A consistent estimator of t? can be obtained from a simple pooled regression, 


through the origin, of 
12, — Ja on exp(2xiB), t = 1,...,T; i= 1,...,N. 
Let 7? be the coefficient on exp(2x;:B). It is consistent for t? because, under Ho, 
E(wilx:) = exp(a + xB) + 77[exp(xiB)]°, 


where ui = yi — E(vir|Xiz). We could also use the many covariance terms in estimating T? 


because E(w iti-|x;) = T? exp(xiB) exp(xiB), t # r. So for all t£, r = 1,..., T, we can write 
UitUir — dy CXP(A + XB) = T? exp(xieB) exp(xi-B) + Vir 
where E(vin|x;) = 0 and dy = 1[¢ = r] is a dummy variable. The pooled regression would be 
U jl ir — dVi ON EXP(XirB) Exp(Xi-B) 
Next, we construct the T x T weighting matrix for observation i, as in Section 18.7.3. The 


matrix W (5) = W(x;,5) has diagonal elements 
exp(@ + xup) + č? exp(2xiB), t = 1,...,7 


and off-diagonal elements 
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Z2 exp(xiB) exp(xif), t # r. 
Using this weighting matrix ina MWNLS estimation problem we can simply add the time 
averages, X;, as an additional set of explanatory variables, and test their joint signficiance. This 
is the VAT version of the score test. 

In practice, we might want a robust form of the test that does not require 
Var(y,|xi) = W(xi,5) under Ho, where W(x;,6) is the matrix described above. We can just 
use the fully robust variance matrix reported at the bottom of page 761. 

Using modern software that supports MWNLS a simpler approach is to estimate the model 
under the alternative and obtain a Wald test of Ho : y = 0, where it is valid to act as if 
Var(c;|x;) = T? because this is true under the null. This would differ from the score approach 
in that t? would be estimated using a first stage where y is also estimated. A fully robust Wald 
test is easy to obtain if we have any doubts about the variance-covariance structure. 

Incidentally, this variance-covariance structure is different from the one used in the GEE 
literature for Poisson regression. With GEE and an exchangeable correlation structure, the 


nominal variance would be 
Var(yvirlxi) = expla + xB + Xi) 


and the nominal covariances 


Cov(vit, VirlXi) = Pf exP(XiB) exp(xirB) . 


c. If we assume (18.83), (18.84) and c; = a;exp(a + X;y) where a;|x; ~ Gamma(6, ô), then 
testing involves estimation of a Poisson panel data model under random effects assumptions. 


Under these assumptions, we have 
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Vir|Xi, a; ~Poisson|a; exp(a + xinB + Xi) | 
Yit Vir are independent conditional on (x;, a;) 


a;\x; ~ Gamma(6, ô). 

In other words, the full set of random effect Poisson assumptions holds, but where the mean 
function in the Poisson distribution is a;exp(a + xB + x,y). In practice, we just add the 
(nonredundant elements of) X; in each time period, along with a constant and x;, and carry out 
a random effects Poisson analysis. We can test Ho : y = 0 using the LR, Wald, or score 
approaches. Any of these would be asymptotically efficient. None is robust to misspecification 
of the Poisson distribution or the conditional independence assumption because we have used a 
full distribution for y, given x; in the MLE analysis. 

18.6. a. We know from Problem 12.6 that pooled nonlinear least squares consistently 
estimates B, when B,, appears in correctly specified conditional means for each ¢. Because 


mi(B) depends on x; we should show 
E(VVirlxi) = mi(B,), t= 1,...5T7, 
as suggested in the hint. To this end, write 
Vie = Ci + Mi(Xi,B,) + Ui, ECualxXi,ci) = 0, t= 1,...,T. 
Then subtracting off time averages gives 
Vi = mi(B,) + üis 


T 

oe a —1 

Üi = Ui — T > Uir. 
r=1 


Because E(üx|x;) = 0,t = 1,..., T, consistency follows generally by Problem 12.6. We do 
have to make an assumption that ensures that B,, is identified, which restricts the way that 


time-constant variables can appear in m(x, B). (For example, additive time-constant variables 


414 


get swept away by the time demeaning.) 
b. By the general theory of M-estimation, or by adapting either Problem 12.6 or 12.7, we 


can show 


N T 
IN (B-B,) = AGN? DTD [VpitulB,) üu + 0p, 


i=l #1 


where 
T 
Agar S EL Vprinie(B,) Vprine(B,) | 


is P x P and P is the dimension of B. (As part of the identification assumption, we would 


assume that A, is nonsingular.) As in the linear case, we can write, for each ż, 
T T 
së ls oo / 
Di Voal Bo) ti = $ Vprai(B,) un 
t=1 t=1 


Further, Var(y,|xi,c;) = o2I7 is the same as E(u;uj|x;,c;) = 0217, which implies 


E(ui|xi) = 03 


E(uirlxi) = 0,t #7 
Therefore, by the usual iterated expectations argument, 
2 fe 1 ea 2 ee 1 ws E 
E| u3} Vpřu(B,) Veinn(B,) | = o3E[ Vgřru(B,) Veri(B,) |, t= 1,...,7 
and 
E| unur Vpgii(B,) Vprini(B,) | = 0, t+ r. 


It follows that 
T T ; 
var Vaina 'u ) = a E| Vpiitn(B,) Vpiri(B,) 1}. 
t=1 t=1 
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Therefore, under the given assumptions, 


Avar| VNB — B,) | = aÈ E[ Verru (Bo) Vpini(B,) 1) . 


As in the linear case, the tricky part is in estimating o2. We can apply virtually the same 


argument. Let ij; = Fu —2() for all i and t. Then a consistent estimator of o2 is 


N T 
AD 1 2 
ô? = Say Da Dain 


where the subtraction of P is not needed but is often used as an adjustment for estimation of 


B,- Estimation of A, gives 


N T 
Â = N! > > Vpritie(B)'Vpiini(B). 


Then, 
—™~_LA D a A A i 
Avar(B) = 6? > SO [Vpr Ê) V př Ê) 1) . 


c. A fully robust variance matrix estimator uses A and 


N T 
BENT’, 2 yi intl prt ie(B)'V prinie(B), 
i=1 t= 
which allows for arbitrary heteroskedasticity and serial correlation in <u : t = 1,..., 7}. Then 
Avar(ĝ) = “BÂ ‘IN, as usual. 
Remember that the estimator of A, relies on correct specification of the conditional mean; 


in its weakest form, E(¥;:|x;) = mi:(B,,), which is implied by the model we started with, 


E(vit|Xi, ci) = ci + m(Xi,B, ). If we want to allow the model to be misspecified we should use 
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the full Hessian, Veriti(B)'Vprian(B) — Ô «V 3öru (Ê) in place of Verinn(B)'Vprinn(B). 

d. This is easy following the hint. For each i and given B, ¢;(B) is just the intercept in the 
simple regression of yx on 1, mi(B),¢ = 1,..., T. Therefore, ¢;(B) = Y: —m;(B). Therefore, we 
can write problem (18.105), after concentrating out the c;, as 

N T N T 
min 2 di - [P = mi(B)] ~ mi(B)}? = min 2 Li = tna(B)), 
which is what we wanted to show. Note by treating the c; as N parameters to estimate and 
using a standardi degrees-of-freedom adjustment, solving (18.105) does yield the estimate 6 
from part b when we use the sum of squared residuals over the degrees of freedom, 
NT-N-P=N(T-1)-P. 
18.7. a. First, for each ¢, the density of yi; given (x; = x,c; = c) is 
Svilx,5B,) = exp[-c + m(x,,B,)][e + m(x,B,)} "ye, yi = 0,12, ... 
Multiplying these together gives the joint density of (vii,..., vir) given (x; = x,¢; = c). 
Taking the log, plugging in the observed data for observation i, and dropping the factorial term 
gives 
T 
S {cima n B) + villog(c:) + log(m(xi,B)) ]}- 
t=1 
b. Taking the derivative of l;(c;, B) with respect to c;, setting the result to zero, and 


rearranging gives 


T 
(nilci) = > m(Xir,B). 


t=1 


Letting c;(B) denote the solution as a function of B, we have c;(B) = n;/M;(B), where 
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M(B) = ae m(Xir, B). The second order sufficient condition for a maximum is easily seen 


to hold. 
c. Plugging the solution from part b into 0;(c;,B) gives 
T 
lile:(B), B] = -[7:/M:(B)]M:(B) + >) yu {log[n:/M:(p)] + logim(xi, B)]} 
t1 


T 
= -ni + nilog(n;) + S vielloglm(xir, B)/Mi(B)]} 


t=1 


T 
= Vo vireloglp:(xir,B)] + (n - 1) log(n:), 
t=1 


because p(X, B) = m(xiz,B)/M;(B); see equation (18.89). 

d. From part c it follows that if we maximize YS 0;(c;, B) with respect to (cj, ...,cN) — 
that is, we concentrate out these parameters — we get exactly aun 0i[c:(B), B]. Except for the 
term LG — 1)log(n;) — which does not depend on ß — this is exactly the conditional 
log-likelihood for the conditional multinomial distribution obtained in Section 18.7.4. 
Therefore, this is another case where treating the c; as parameters to be estimated leads us to a 
J/N -consistent, asymptotically normal estimator of B.. 

18.8. a. Generally, there is no simple way to recover E(y|x) from E{log[y/(1 — y) ]|x}. In 
particular, if E(w|x) = xa, it is not true that E(y|x) = exp(xa)/[1 + exp(xa) ]. In other words, 
we cannot simply “undo” the log-odds transformation any more than we can undo any 
nonlinear transformation when trying to recover conditional means. 

If we make stronger assumptions, we can recover E(y|x) from E(w|x). Suppose we write 
w = xa + v and assume that v is independent of x. Assume for simplicity that v is continuous 


with density g(-). Then 
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EQy|x = x°) = f exp(x°a + v)/[1 + exp(x°a + v) |g(v). 
If we parameterize g(-) — say, g(+;p) — and we have a consistent estimator of p, then 
E(x = x°) = f exp(x°G + v)/[1 + exp(x°a + v) ]2(v; f), 


where @ could be the OLS estimator from regressing w; on x; or the maximum likelihood 
estimator based on D(w|x). If D(v|x) is assumed to be normal then OLS is MLE. Even if we 
specify g(+;p) to be a mean-zero normal distribution, obtaining the integral is cumbersome. 
There is a simpler approach that is also more robust. If we just maintain that v and x are 
independent then, by the law of large numbers, E()|x = x°) for a given vector x° is 
consistently estimated by 
N 
N+ >», exp(x°a, + v;)/[1 + exp(x°a + v;)], 
i=l 
where we can think of drawing random samples {(x;,v;) : i = 1,2,...,N}. Because we cannot 
observe v;, and we do not know a, we operationalize this formula by replacing a with a, 
including computing residuals ¥; = w; — x;ĝ, i = 1,...,N. Then 
N 
Evix = x°?) = Nt D exp(x°â + Ŷ;)/[1 + exp(x°G + Ŷ;)]. 
i=1 
This is an example of Duan’s (1983) “smearing estimate.” This estimator is consistent under 
the assumptions given — which do not require a full distribution, but do include independence 
between v and x. Obtaining analytical standard errors can be done by following Probem 12.17. 


Bootstrapping is also valid. Unfortunately, this approach does not work if y can take on the 


boundary values zero or one. 
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The above integral for E(y|x = x°) can be written as E(y|x = x°) = r(x°a) and so, if v and 
x are independent, then [OE(y|x)/0x; |/[OEQ|x)/0xn] = aj/an: for continuous explanatory 
variables, the ratio of the partial effects equals the ratio of the parameters in the linear model 
for w = log[y/(1 —y)]. 

b. The functional form E(y|x) = exp(xB)/[1 + exp(xB)], and that implied by the 
assumptions in part a, are generally incompatible. However, as mentioned above, 
independence between v and x in log[y/(1 — y)] = xa + v implies that a;/a, is the ratio of the 
partial effects of continuous explanatory variables x; and x+. In the fractional logit model, the 
ratio of partial effects is 8;/Bn. Therefore, it can make sense to compare ratios of coefficients 
on continuous explanatory variables across the two procedures. But the magnitudes themselves 
are not generally comparable. 

c. Because we have a full distribution of y given x, we should use maximum likelihood, 
just as described in Section 17.7. 

d. The functional form for E(y|x) — as a function of the parameters y and o? — is given in 


equation (17.66) where we set a; = 0, a2 = 1, with the obvious change in notation: 


EQ|x) = {®[(1 — xy)/o] - ®[(-xy)/o] xy + oxi — xy)/o] - 6[(-xy)/o]} 
+ ®[-(1 - xy)/o] 


= {®[(1 - xy)/o] - O[ (-xy)/o] xy + 0 {o[(C1 - xy)/o] - oL(-xy)/o]} 
+1-®[(1 - xy)/o]. 


This gives yet a different functional form. Nevertheless, it is easily seen from equation (17.67) 


that 
[OE(y|x )/0x; ]/[OEQ)|x)/Oxn] = V;/Y h, 


so that ratios of the coefficients on continuous variables can be compared with those from part 
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e. Because part b only specified a conditional mean, it does not make much sense to 
compare the Bernoulli quasi-log-likelihood with the Tobit log-likelihood. If we are mainly 
interested in E(y|x) — which part b essentially maintains — it makes sense to base comparisons 
on goodness-of-fit for E(y|x). For each approach, we can compute a squared correlation 
between the y; and the E(y;|x;), where the conditional expectations are estimated using each 
approach. Or, we can use a sum-of-squared residuals version (and possibly adjust for 
degrees-of-freedom because the Tobit model has an extra mean parameter, o). 

f. We would not expect to get similar answers for the full sample — which includes 
observations with y; = 0 — and the subsample that excludes y; = 0 (unless the fraction of 
excluded observations is small). Clearly, we cannot have both 
E(Qy|x) = exp(xB)/[1 + exp(xB) ] and Elx, y > 0) = exp(xd)/[1 + exp(xd) ]. Moreover, there 
is no reason to expect the best fits to yield roughly the same parameter estimates. 

g. Because we have assumed E(y|x,y > 0) = exp(x6)/[1 + exp(x8) ], we consistently 
estimate 6 using the sample for which 0 < y; < 1, provided we use the Bernoulli QMLE (or 
NLS or weighted NLS). There is no bias from excluding the y; = 0 observations because we 
have specified the mean for the subpopulation with y > 0. (We discuss sample selection issues 
in Chapter 19.) 

h. We would use a two-part model. Let 5 be the Bernoulli QMLE from part g, using 
observations for which y; > 0. To estimate n, we run a binary response model using the binary 
variable r; = 1[y; > 0]. Then P(r; = 1|x;) = G(x). Probably we would use a probit or logit 


model. Then, 
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E(yilx:) = Po: > O|x;) ° EQilxi,yi > 0) 
= G(x;fj) + {exp(x,)/[1 + exp(x,8)]}. 


18.9. a. The Stata output follows. I first convert the dependent variable to be in [0, 1], 


rather than [0, 100]; this is needed to estimate a fractional response model. 


The coefficient on ACT means that five more points on the ACT test, other things equal, is 


associated with a lower attendance rate of about .017(5) =.085, or 8.5 percentage points. For 


priGPA, another point on the GPA (a large change) is associated with an attendance rate 


roughly 18.2 percentage points higher. 


Twelve of the fitted values are bigger than one. This is not surprising because almost 10 


percent of the students have perfect attendance rates. 
use attend 


sum atndrte 


Variable | Obs Mean Std. Dev. 


atndrte | 680 81.70956 17 .04699 


replace atndrte = atndrte/100 
(680 real changes made) 


reg atndrte ACT priGPA frosh soph 


Source | SS df MS 


Model | 5.95396289 4 1.48849072 
Residual | 13.7777696 675 .020411511 


Total | 19.7317325 679 .029059989 


Number of obs 
F( 4, 675) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


- .0202207 
. 1599947 
.0177377 

- .0174327 


. 2040379 


atndrte | Coef Std. Err t 
Se fon Sea + a Ve me dat, Fle lew et Sy a +--------------------------------------------------------------- 
ACT | -.0169202 .001681 -10.07 
priGPA | . 1820163 .0112156 16.23 
frosh | 0517097 0173019 2.99 
soph | 0110085 .014485 0.76 
cons | 7087769 0417257 16.99 


. 6268492 


predict atndrteh_lin 
(option xb assumed; fitted values) 
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sum atndrteh_lin 
Variable | Obs Mean 
atndrteh_lin | 680 .8170956 


count if atndrteh_lin > 1 
12 


count if atndrte == 1 
66 


. 0936415 


. 4846666 


1.086443 


b. The GLM standard errors are given in the output. Note that 6 ~.0161. In other words, 


the usual MLE standard errors, obtained, say, from the expected Hessian of the 


quasi-log-likelihood, are much too large. The standard errors that account for o? < 1 are given 


by the GLM output. (If you omit the sca( x2) option in the glm command, you get the usual 


MLE standard errors.) 


glm atndrte ACT priGPA frosh soph, family(binomial) link(logit) sca(x2) 


note: atndrte has noninteger values 


Generalized linear models 
Optimization : ML 
Deviance = 87.81698799 


Pearson = 85.57283238 


u*(1-u/1) 
In(u/(1-u) ) 


Link function 


Variance function: V(u) 
g(u) 


No. 


of obs 


Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


680 
675 


. 1300992 
. 1267746 


.6724981 


= -4314.596 


Log likelihood = -223.6493665 
OIM 

atndrte | Coef Std. Err 

ACT | -.1113802 .0113217 

priGPA | 1.244375 .0771321 

frosh | . 3899318 .113436 

soph | .0928127 . 0944066 

cons | . 7621699 . 2859966 


[Binomial] 

[Logit] 

AIC 

BIC 
P>|z| 
0.000 - .1335703 
0.000 1.093199 
0.001 . 1676013 
0.326 - .0922209 
0.008 . 201627 


- .0891901 
1.395552 
.6122622 
. 2718463 
1.322713 


(Standard errors scaled using square root of Pearson X2-based dispersion.) 


. di (.1268)^2 
.01607824 


c. Because the coefficient on ACT is negative, we know that an increase in ACT score, 
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holding year and prior GPA fixed, actually reduces predicted attendance rate. The calculation 
below shows that for priGPA — 3.0 and frosh = soph = 0, when ACT increases from 25 to 30, 
the estimated fall in atndrte is about .087, or 8.7 percentage points. This is very similar to the 
estimate using the linear model — 8.5 percentage points — which is the same for any values of 


the explanatory variables. 


. di exp(_b[_cons] + _b[ACT]*30 + _b[priGPA]*3)/(1 + exp(_b[_cons] + _b[ACT]* 
+ _b[priGPA]*3)) - exp(_b[_cons] + _b[ACT]*25 + _b[priGPA]*3) 
/(1 + exp(_b[_cons] + _b[ACT]*25 + _b[priGPA]*3) ) 

- .08671822 


d. The R-squared for the linear model is about .302. For the logistic functional form, I 
computed the squared correlation between atndrte; and E(atndrte,|x;). This R-squared is about 
.328, and so the logistic functional form does fit better than the linear model. And, remember 
that the parameters in the logistic functional form are not chosen to maximize an R-squared; 
the linear model coefficients are chosen to maximize R-squared given the set of explanatory 


variables. 


. predict atndrteh_log 
(option mu assumed; predicted mean atndrte) 


. corr atndrte atndrteh_log 


(obs=680) 
| atndrte atndrt~g 
en ee tee ph ae Sey a +------------------ 
atndrte | 1.0000 
atndrteh_log | 0.5725 1.0000 
. di .5725^2 
.327 75625 


18.10. a. The pooled Poisson estimates, with the usual pooled standard errors that assume a 
unit variance-mean ratio and dynamic completeness of the conditional mean, are given below. 
Using these nonrobust standard errors, all lags except the first are significantly different from 


Zero. 
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use patent 


poisson patents y77-y81 lrnd Irnd_1 1rnd_2 1rnd_3 Irnd_4 


Poisson regression Number of obs 1356 
LR chi2(10) 68767.04 
Prob > chi2 = 0.0000 
Log likelihood = -12194.868 Pseudo R2 = 0.7382 
patents | Coef Std. Err Z P>|z | [95% Conf. Interval 
sek put a i, Sal a a i al +--------------------------------------------------------------- 
y77 | -.0732934 .0190128 -3.85 0.000 .1105578 - .0360291 
y78 | - .227293 .0196925 -11.54 0.000 . 2658896 -.1886965 
y79 | - 36251 .0196912 -18.41 0.000 -4011041 - .3239159 
y80 | -.7066175 .0211325 -33.44 0.000 . 7480365 -.6651985 
y81 | -2.115567 . 0331249 -63.87 0.000 -2.18049 -2.050643 
lrnd | . 4406223 .0425948 10.34 0.000 . 357138 .5241066 
lrnd_1 | .0767312 .0635969 1.21 0.228 0479165 . 2013788 
lrnd_2 | . 2452529 .0622048 3.94 0.000 1233337 .3671721 
lrnd_3 | -.1557527 .0630881 -2.47 0.014 2794031 - .0321023 
lrnd4 | . 1619174 .0469008 3.45 0.001 0699936 . 2538412 
cons | 1.157326 0191835 60.33 0.000 1.119727 1.194925 


b. The standard errors computed in part a can be wrong for at least two reasons. The first is 
that the conditional variance, Var(yx#|X;:), may not equal the conditional mean, E(yx|X;t), where 
Xj; contains the current and lagged R&D spending variables. The second is that the mean may 
not be dynamically complete in the sense that 

E(vidXit) + EQvitlXit, Vie-1, Xie, --- )- 
A failure of dynamic completeness generally leads to serial correlation in the implied error 
terms, and cause the score of the partial quasi-log-likelihood function to be serially correlated. 

A third reason the standard errors might not be valid is they use the expected Hessian form 
of the asymptotic variance. This form is incorrect of the conditional mean is misspecified. 

c. The estimates below give ô ~ 4.14, which shows that, even if we assume a constant 
variance-mean ratio and dynamic completeness of the conditional mean, we need to multiply 
all Poisson standard errors by just over four. 


Now only the contemporaneous R&D variable is significant; none of the lags has a t 
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statistic above one. 


glm patents y77-y81 lrnd Irnd_1 1rnd_2 1Irnd_3 1Irnd_4, family(poisson) 


link(log) sca(x2) 
Generalized linear models No. of obs = 1356 
Optimization ML Residual df = 1345 
Scale parameter = 
Deviance = 20618.28952 (1/df) Deviance = 15.32958 
Pearson = 23082.45413 (1/df) Pearson = 17.16168 
Variance function: V(u) = u [Poisson ] 
Link function g(u) = Ln(u) [Log] 
AIC = 18.00276 
Log likelihood = -12194.86797 BIC = 10917.75 
OIM 
patents | Coef Std. Err Z P>|z | [95% Conf. Interval 
Bir ee a ae re ees +--------------------------------------------------------------- 
y77 | -.0732934 .0787636 -0.93 0.352 -.2276672 .0810803 
y78 | - .227293 .0815794 -2.79 0.005 - .3871858 - .0674003 
y79 | - .36251 .0815742 -4.44 0.000 - .5223925 - .2026275 
y80 | -.7066175 .087545 -8.07 0.000 - .8782025 - . 5350325 
y81 | -2.115567 .1372256 -15.42 0.000 -2.384524 -1.846609 
lrnd | . 4406223 .176456 2.50 0.013 .0947748 . 7864698 
lrnd_1 | .0767312 . 2634608 0.29 0.771 - . 4396424 . 5931048 
lrnd_2 | . 2452529 .2576938 0.95 0.341 - .2598177 . 7503235 
lrnd_3 | -.1557527 . 2613529 -0.60 0.551 - .6679949 . 3564895 
Irnd_4 | .1619174 .1942941 0.83 0.405 - .2188921 . 5427269 
cons | 1.157326 .0794708 14.56 0.000 1.001566 1.313086 


(Standard errors scaled using square root of Pearson X2-based dispersion. ) 


di sqrt(17.16) 


4.142463 


d. The QLR statistic is just the usual LR statistic divided by 6? = 


17.17. The value of the 


unrestricted log-likelihood is £, = —12, 194.87. The value of the restricted log-likelihood 


(without any of the lags), using the same set of years in estimation (1976 to 1981), is 


£, = —12,252.37. Therefore, 


OLR = 2 + (12,252.37 — 12, 194.87)/17.17 = 6.70. 


With four degrees of freedom in a chi-square distribution, this leads to p-value = .153. The lags 


are jointly insignificant at the usual 5% level. The usual LR statistic is 115, which (incorrectly) 
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implies very strong statistical significance for the lags. 


e. The Stata results are blow. With the fully robust standard errors, the contemporaneous 


term and the second lag are marginally significantly. The robust Wald test for the exclusion of 


the four lags gives p-value =.494. The fully robust standard errors are clearly smaller than the 


Poisson MLE standard errors, but they are actually smaller in some cases than the GLM 


standard errors from part c. The four lags are joint insignificant. 


glm patents y77-y81 lrnd Irnd_1 1rnd_2 1rnd_3 1Irnd_4, family(poisson) 


link(log) robust cluster(cusip) 


Generalized linear models 
Optimization ML 

Deviance = 20618.28952 
Pearson = 23082.45413 
Variance function: V(u) = u 
Link function : g(u) = Ln(u) 


Log pseudolikelihood = -12194.86797 


No. of obs 
Residual df 
Scale parameter 
(1/7df) Deviance 
(1/df) Pearson 


[Poisson ] 
[Log] 


AIC 
BIC 


adjusted for 226 clusters 


1356 
1345 


15.32958 
17.16168 


18.00276 
10917.75 


in cusip 


Z P>|z| [95% Conf. 
31 0.021 -.1356115 
55 0.000 - .3251445 
32 0.000 - .49609 
58 0.000 - .837507 
55 0.000 -2.339077 
83 0.067 - .0315637 
62 0.532 - .1640376 
74 0.082 - .0313848 
72 0.471 - .579293 
60 0.546 - .3633395 
61 0.000 . 7532903 


- .0109754 
- .1294416 
- .22893 

- .575728 
-1.892056 
9128083 
3175 
5218906 
.2677875 
.6871743 
1.561362 


NNN 


(Std 
| Robust 
patents | Coef Std. Err 
y77 | -.0732934 .0317955 
y78 | - .227293 .0499251 
y79 | - .36251 . 0681543 
y80 | -.7066175 . 0667816 
y81 | -2.115567 . 1140381 
lrnd | . 4406223 . 2409156 
lrnd_1 | .0767312 . 1228435 
lrnd_2 | . 2452529 . 1411443 
lrnd_3 | -.1557527 .2160959 
lrnd4 | .1619174 .2679931 
cons | 1.157326 . 2061445 
test lrnd_1 1rnd_2 lrnd_3 lrnd_4 
1) [patents]lrnd_1 = 0 
2) [patents]lrnd_2 = 0 
3) [patents]lrnd_3 = 0 
4) [patents]lrnd_4 = 0 
chi2( 4) = 3.40 
Prob > chi2 = 0.4937 


427 


f. The estimated long run elasticity is about .441 +.077 +.245 —. 156 +.162 =.769. The 


1incom command in Stata provides a simple way to obtain a fully robust standard error. Its 
fully robust standard error is about .072, which gives a 95% confidence interval from about 
.627 to .910. As is often the case in distributed lag models, we cannot estimate the lag 


distribution very precisely but we can get a fairly precise estimate of the long run effect. 
. lincom lrnd + iIrnd_1+ I1rnd_2+ dIrnd_3+ Irnd_4 


( 1) [patents]lirnd + [patents]1lrnd_1 + [patents]lrnd_2 + [patents]1rnd_3 
+ [patents]lirnd_4 = 0 


g. The fixed effects Poisson estimates are given below. The contemporaneous spending 
term and second lag have much smaller effects now, while lags three and four become larger 
and even statistically significant — but with the third lag still having a large, negative 
coefficient. When we use the fully robust standard errors, only the second lag is statistically 
significant at conventional levels, although the third and fourth lags are close. 

The estimated long-run elasticity is now only .261 and it is, at best, marginally significant 


with ¢ = 1.60. 


. Xtpqml patents y77-y81 lrnd 1Irnd_1 1rnd_2 1Irnd_3 1rnd_4, fe 
note: 8 groups (48 obs) dropped because of all zero outcomes 


Conditional fixed-effects Poisson regression Number of obs = 1308 
Group variable: cusip Number of groups = 218 
Obs per group: min = 
avg = 6. 
max = 
Wald chi2(10) = 3002.51 
Log likelihood = -2423.7694 Prob > chi2 = 0.0000 
patents | Coef Std. Err. Z P>|z | [95% Conf. Interval 
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y77 | -.0210069 .0204558 -1.03 0.304 - .0610995 .0190856 
y78 | - . 108368 .0251005 -4.32 0.000 -.157564 - .059172 
y79 | -.1721306 . 0306902 -5.61 0.000 - .2322822 -.1119789 
y80 | -.4468227 .039243 -11.39 0.000 - .5237375 - .3699079 
y81 | -1.797958 .0547882 -32.82 0.000 -1.905341 -1.690575 
lrnd | .0492403 .0558275 0.88 0.378 - .0601795 . 15866 
Irnd_1 | .0512096 . 0666844 0.77 0.443 - .0794894 . 1819086 
Irnd_2 | . 130944 . 0662164 1.98 0.048 .0011622 - 2607259 
Irnd_3 | -.1909907 .0714669 -2.67 0.008 - .3310632 - .0509182 
irnd_4 | . 2201799 .0703992 3.13 0.002 . 0821999 . 3581599 
Calculating Robust Standard Errors.. 
patents | Coef Std. Err Z P>|z | [95% Conf. Interval 
Euan a a Sa Sag rel Te as es +--------------------------------------------------------------- 
patents | 
y77 | -.0210069 .026186 -0.80 0.422 - .0723306 . 0303168 
y78 | - . 108368 -055447 -1.95 0.051 -.2170422 . 0003062 
y79 | -.1721306 .071949 -2.39 0.017 - . 313148 - .0311131 
y80 | -.4468227 .0829316 -5.39 0.000 - . 6093657 - .2842797 
y81 | -1.797958 . 1380887 -13.02 0.000 -2.068607 -1.527309 
lrnd | - 0492403 . 0868099 0.57 0.571 - .120904 . 2193845 
Irnd_1 | .0512096 . 0600491 0.85 0.394 - .0664845 . 1689038 
Irnd_2 | . 130944 .0592739 2.21 0.027 .0147694 . 2471187 
Irnd_3 | -.1909907 . 1066283 -1.79 0.073 - .3999783 .0179968 
irnd_4 | . 2201799 . 1431273 1.54 0.124 - .0603446 . 5007043 
Wald chi2(10) = 366.83 Prob > chi2 = 0.0000 


lincom lrnd + lrnd_1 + lrnd_2 + lrnd_3 + I1rnd_4 


( 1) [patents]lrnd + [patents]lrnd_1 + [patents]lrnd_2 + [patents]lrnd_3 
+ [patents]lrnd_4 = 0 


18.11. a. For each ¢, the density is 
Sfilvdxi,ci) = exp(—cimir)mi/yi!, Ye = 0,1,2,... 


Under the conditional independence assumption, the joint density of (ya, ..., vir) given (xi,C;) 


is 
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T 


fOr yrn ci) = | [lepem emy] 


t1 
T 

= (11 mi /y;! de exp(—c;M;), 
(1 


where M; = ma +...+m;r ands = yı +...+y7, for all nonnegative integers <y; : t = 1,..., T}. 
b. To obtain the density of (vii, ..., vir) given x; — say g(y1, ..., Vir|xi) — we integrate out 


with respect to the distribution of c; (because c; is independent of x;). Therefore, 
4 ioe) 
g1,- YTX) = (I mit ) f c5 exp(—c;M;)[8?/T (8) ]c* exp(—ôc)de. 
0 
t=1 


Next, we follow the hint, noting that the general Gamma(a, p) density has the form 


h(c) = [B*/T(a)]c** exp(—Bc). Now 


f : c5 exp(-cM;)[5°/P(5) Je exp(-de)de = f [sr @)]eo exp[—-(M; + 8)c]dc 
= [651 (6)][T + 6M + 6) ] 
. f [o + 5)°)/P(s + 8) Je exp[-(M; + 5)c]de, 


and the integrand is easily seen to be the Gamma(s + 6, M; + ô) density, and so it integrates to 


unity. Therefore, we have shown 


T 
g1- YrlX:) = (11 mita ro) [T(s +5); +5) ] 


t=1 


for all nonnegative integers <y; : t = 1,..., Ty. 


18.12. a. First, the density of y; given (x;,c;) is 


fodi ci) = (Alen) Tn) Ww"? exp (cy, y: > 0. 


Following the hint, the density of the sum, s;, given (x;,c;) is 
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g(s|xi,c;) = [(1/c) EM) s“~ exp[-(1/c;)s], s > 0, 
where M; = ma +...+mzir. Therefore, the density of (vi1,..., vir) given (s; = s,x;j,c;) is 


(Ue mat"? expl-Gleiyr]) (Elen nra ea ? expl-(e)yr-1]) 
e ((A/c:)”"® mir) ](s — y1 =.. ra)” + exp[-(ei)(s — y1 =... -y r1) glx; ci) 


T lyr 
= cue (T] rem ) (Th? Jest-veasnqaiesarannies exp[—(1/c;)s]) 


t=1 
T lyr 
= ra(T] rim) ) (ppe jee. 
t=1 t=1 


which is what we wanted to show. Note how c; has dropped out of the density. 


b. The conditional log-likelihood for observation i is 


T 
UB; ya,.--,ir,Xi) = log{TLM,(B)]} - X log {[mir(B)]} 


T 
+ Simp) — 1}logQvir) — [Mi(B) - 1] log@va ++ yir), 


where mi(B) = m:(x:,B) and M; = yo mi(B). We can sum across all 7 and maximize the 
resulting log-likelihood with respect to B to obtain the fixed effects gamma estimator. The 
asymptotic theory is standard, provided the regression functions are smooth functions of B and 
depend on the covariates in such a way that B, is identified. 
18.13. a. Plug in the data, a genereric value B, and take the natural log: 
0i(B) = x,B + log(vi)[exp(x:B) — 1]. 
Notice that this is not a member of the linear exponential family (because it is log(y;), not yi, 


that appears). 


b. The gradient is 
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Vli(B) = X, + log(v;)x; exp(x:ß) 
and taking the transpose gives 


si(B) = x; + log(y,)x;exp(x:B) 
= x;[1 + log(y,) exp(x.B)]. 


c. Because 0 < y; < 1, log(yi) < 0 for all 7. Therefore, we know E[log(y;)|x:] < 0 for any 
outcome x;. 


d. We use part c: 


E[s;(B,)|xi] = x;{1 + E[log(y)|x:] exp(x.B,,)} 
= x;{1 — exp(—x,B,)exp(x.B,)] = 0. 


e. Using s;(B) = x;[1 + log(y;) exp(x;B), the Hessian is 


H;(B) = Vgs:($) = x;x;log(y;) exp(x:B) 


and so 
E[H;(B,)|xi] = x;x/E[log(y;)|xi] exp(x:B,) 
= x;x;E[log(y;)|x:] exp(x.B,) 
= -x}x;exp(—x;B,) exp(x:B,.) = —x:x;, 
and so 


—E{H(B,)|xi] = x;x:. 
f. Given part e, the formula based on the expected Hessian is easiest: 
Avar[ /N Ê -B,) = [E@ix,)]* 


and so 
~~n X i 
Avar[ /N (6 - B,) = G > x) = (X'X/N)!. 
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g. From part d, we see that the key condition for Fisher consistency, that is, to make 

E[s,(B,)|xi] = 0, is that 

Ellog(v;)|x:] = —exp(-x:B,). 
In other words, the implied model for E[log(y;)|x;] must be correct. Unfortunately, having 
E(yi|x;) correctly specified, that is, E(y;|x;) = exp(x.B,)/[1 + exp(x:B,)], generally says 
nothing about E[log(y;) |x]. 

h. We could use the Bernoulli QMLE to estimate the parameters in 
E(vi|xi) = exp(x.B,)/[1 + exp(x:B,,)] directly, without extra assumptions about D(y;|x;). 

18.14. a. The Stata output is given below with the three sets of standard errors asked for in 
the problem. The inference starts from the least robust and ends with the most robust. 

The difference in standard errors is striking. The standard errors that effectively maintain a 
binomial distribution — at least its first two moments — lead to huge ¢ statistics. When we allow 
for a scale factor — the so-called GLM variance assumption, (18.34) — the standard errors 
increase by at least a factor of 20. The fully robust standard errors, which allow unrestricte 
Var(partic ;\employ;,x;) are still larger — more than three times the standard errors produced 
under the GLM assumption. It seems pretty clear the binomial distribution does not hold in this 
application and that the actual conditional variance is not proportional to the nominal variance 
in the binomial distribution. It is pretty clear that we should use the fully robust standard 


errors, which lead to much more modest (but still quite significant) ¢ statistics. 


. glm partic mrate ltotemp age agesq sole, fam(bin employ) link(logit) 


Generalized linear models No. of obs = 4075 

Optimization > ML Residual df = 4069 
Scale parameter = 

Deviance = 2199795.239 (1/df) Deviance = 540.6231 

Pearson = 2021563.356 (1/df) Pearson = 496.8207 


433 


Variance function: V(u) u*(1-u/employ ) 


544.0016 
2165971 


Link function : g(u) In(u/(employ-u) ) 
Log likelihood = -1108397.213 
| OIM 
partic | Coef Std. Err 
mrate | .9871354 . 0033855 291. 
ltotemp | -.1386562 .000531 -261. 
age | -0718575 .0001669 430. 
agesq | -.0005512 2.82e-06 -195. 
sole | . 3419834 . 003443 99. 
cons | 1.442014 .0053821 267. 


[Binomial] 

[Logit] 

AIC 

BIC 
P>|z| [95% Conf. 
0.000 . 9805 
0.000 - .139697 
0.000 .0715305 
0.000 - .0005567 
0.000 3352353 
0.000 1.431465 


glm partic mrate ltotemp age agesq sole, 


Generalized linear models 


Optimization : ML 
Deviance = 2199795.239 
Pearson = 2021563.356 


Variance function: V(u) = u*(1-u/employ) 
Link function g(u) = Lln(u/(employ-u) ) 


fam(bin employ) link(logit) sca(x2 


No. of obs 
Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


4075 
4069 


540.6231 
496.8207 


544.0016 
2165971 


Log likelihood = -1108397.213 
OIM 

partic | Coef Std. Err 
mrate | - 9871354 .0754604 13. 
ltotemp | -.1386562 .0118368 -11, 
age | .0718575 . 0037193 19. 
agesq | -.0005512 .0000629 -8. 
sole | . 3419834 .0767418 4. 
cons | 1.442014 .1199639 12. 


[Binomial] 

[Logit] 

AIC 

BIC 
P>|z| 
0.000 . 8392358 
0.000 - .1618559 
0.000 . 0645678 
0.000 - .0006744 
0.000 .1915723 
0.000 1.206889 


1.135035 
- .1154565 
.0791472 
- .000428 
. 4923945 
1.677139 


(Standard errors scaled using square root of Pearson X2-based dispersion) 


glm partic mrate ltotemp age agesq sole, fam(bin employ) link(logit) robust 


Generalized linear models 


Optimization : ML 
Deviance = 2199795.239 
Pearson = 2021563.356 


Variance function: V(u) = u*(1-u/employ) 
Link function g(u) = Ln(u/(employ-u) ) 


Log pseudolikelihood = -1108397.213 
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No. of obs 
Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


[Binomial] 
[Logit] 


AIC 
BIC 


4075 
4069 


540.6231 
496.8207 


544.0016 
2165971 


Robust 


partic Coef Std. Err Z P>|z | [95% Conf. Interval 

ee ln pee ie ee pe, ee ey ae +--------------------------------------------------------------- 
mrate | .9871354 .2622177 3.76 0.000 .4731982 1.501073 
ltotemp | -.1386562 .0546138 -2.54 0.011 - .2456972 - .0316151 
age | .0718575 .0142656 5.04 0.000 0438974 .0998176 
agesq | -.0005512 . 0001746 -3.16 0.002 - .0008934 - .000209 
sole | . 3419834 .1145195 2.99 0.003 .1175294 . 5664375 
cons | 1.442014 . 4368904 3.30 0.001 . 5857248 2.298303 


b. The fractional logit results for prate are given below — with the same kinds of standard 
errors in part a. In this case the usual MLE standard errors that are too large: they treat o°? = 1 
in (18.58), which is true in the binary case but not generally. With a factional variable, o? < 1. 
In fact, the estimate for this data set is 6? =.214. 

The GLM and fully robust standard errors are much closer now, with the fully robust ones 


typically (but not always) being slightly larger. 


glm prate mrate ltotemp age agesq sole, fam(bin) link(logit) 
note: prate has noninteger values 


Generalized linear models No. of obs = 4075 
Optimization > ML Residual df = 4069 
Scale parameter = 
Deviance = 883.051611 (1/df) Deviance = .2170193 
Pearson 871.5810654 (1/df) Pearson = .2142003 
Variance function: V(u) = u*(1-u/1) [Binomial] 
Link function g(u) = ln(u/(1-u)) [Logit] 
AIC = ,.6350527 
Log likelihood = -1287.919784 BIC = -32941.02 
OIM 
prate | Coef Std. Err Z P>|z | [95% Conf. Interval 
Ht Sa Vi ame pa ay a +--------------------------------------------------------------- 
mrate | 1.147984 . 1468736 7.82 0.000 .8601167 1.43585 
ltotemp | -.2075898 .0290032 -7.16 0.000 - . 264435 - .1507446 
age | .0481773 .0145566 3.31 0.001 . 0196469 .0767077 
agesq | -.0004519 . 0004301 -1.05 0.293 - .0012948 .000391 
sole | . 1652908 . 10408 1.59 0.112 - .0387022 . 3692838 
cons | 2.355715 . 2299685 10.24 0.000 1.904985 2.806445 
glm prate mrate ltotemp age agesq sole, fam(bin) link(logit) sca(x2) 
note: prate has noninteger values 
Generalized linear models No. of obs = 4075 
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Optimization 


Deviance 
Pearson 


Variance function: 


Link function 


Log likelihood 


ML 

= 883.051611 

= 871.5810654 
V(u) = u*(1-u/1) 
g(u) = ln(u/(1-u)) 


= -1287.919784 


Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


4069 


2170193 
2142003 


. 6350527 
-32941.02 


mrate 
ltotemp 
age 


OIM 


1.147984 
- . 2075898 
.0481773 
- .0004519 
. 1652908 
2.355715 


.0679757 
. 0134232 
-006737 
.000199 
.0481701 
. 1064335 


[Binomial] 

[Logit] 

AIC 

BIC 
P>|z| 
0.000 1.014754 
0.000 - . 2338988 
0.000 .0349729 
0.023 - .000842 
0.001 .0708792 
0.000 2.147109 


1.281213 
- .1812808 
. 0613817 
- .0000618 
2597024 
2.564321 


(Standard errors scaled using square root of Pearson X2-based dispersion) 


glm prate mrate ltotemp age agesq sole, fam(bin) link(logit) robust 


note: prate has noninteger values 


Generalized linear 
Optimization 


Deviance = 
Pearson 


Variance function: 
Link function : 


models 
ML 


883.051611 
871.5810654 


V(u) 
g(u) = 


u*(1-u/1) 
In(u/(1-u)) 


Log pseudolikelihood = -1287.919784 


No. of obs 
Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


4075 
4069 


.2170193 
2142003 


. 6350527 
-32941.02 


Robust 
prate | Coef Std. Err. 
mrate | 1.147984 .0747331 

ltotemp | -.2075898 .0141209 
age | .0481773 . 0061543 
agesq | -.0004519 .0001764 
sole | . 1652908 .0505915 
cons | 2.355715 . 1066441 


[Binomial] 

[Logit] 

AIC 

BIC 
P>|z| 
0.000 1.001509 
0.000 - .2352662 
0.000 .036115 
0.010 - .0007976 
0.001 . 0661334 
0.000 2.146696 


1.294458 
- .1799134 
. 0602396 
- .0001063 
. 2644483 
2.564734 


c. It makes sense to compare the coefficients in parts a and b because both approaches 


could be estimating the same conditional mean function for prate; = partic;/employ;. 


Generally, the binomial approach starts with 
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E(Qvilxi,ni) = niACxiB). 
If we divide both sides by n; we get 
Brin) _ Aap) 
or 
E(= )xi,7) = A(x;B) 
which, of course, implies 
E( 2+ |x) = AGB) 
In other words, the fractional variable w; = y;/n; follows a fractional response model with a 
logistic response function. So if we start with E(y;|x;,7;) = n;A(x;B) then both methods 


consistently estimate B. 


d. The Stata output is given below. Because we want the APE on prate, we compute 


z L expaĝ) 
mrate Nt : P 
á ( 2 [1 + exp:ĝ)] ) 


for both set of estimates. For the binomial QMLE the estimate is about . 147. For the Bernoulli 


QMLE, the estimate is about . 130. Incidentally, the linear regression estimate — coefficient on 


mrate — is about . 106, so quite a bit below the other two. 
. qui glm partic mrate ltotemp age agesq sole, fam(bin employ) link(logit) 
. predict xb_bin, xb 
. gen sca_bin = exp(xb_bin)/((1 + exp(xb_bin) )42) 
. sum sca_bin 
Variable | Obs Mean Std. Dev. Min Max 


te i Hs a i a +-------------------------------------------------------- 
sca_bin | 4075 . 1492273 .0602467 .0091082 . 2499969 
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. di .1492273*_b[mrate] 
.14730755 


. qui glm prate mrate ltotemp age agesq sole, fam(bin) link(logit) 
. predict xb_ber, xb 
. gen sca_ber = exp(xb_ber)/((1 + exp(xb_ber))^2) 


. di sca_ber*_b[mrate] 
. 13000441 


e. The Stata output is given below. The APE is about . 038. If we use the linear model, we 
would get .106(.25) ~.027, so somewhat less. 
. qui glm prate mrate ltotemp age agesq sole, fam(bin) link(logit) 


. gen xb_p50 = xb_ber - _b[mrate]*mrate + _b[mrate]*.5 


. gen xb_p25 = xb_ber - _b[mrate]*mrate + _b[mrate]*.25 


. gen phat_p50 = exp(xb_p50 )/(1 + exp(xb_p50 )) 


. gen phat_p25 = exp(xb_p25 )/(1 + exp(xb_p25 )) 
. gen diff = phat_p50 - phat_p25 


. sum diff 
Variable | Obs Mean Std. Dev. Min Max 
eM, Si lm a le Se la a +-------------------------------------------------------- 
diff | 4075 .0375776 .0107886 .0116919 .0689509 


18.15. a. We can just use the usual fixed effects or first-differencing estimators. If we 


define w; = log[yi/(1 — yu)] then we have 


Wit = XuB + Ci + Ui 
E(uulXi ci) = 0, t = 1,...,T, 


which means the key strict exogeneity assumption on {x; : t = 1,..., 7} holds. Of course, we 
could use a GLS version of FE or FD, or use Chamberlain’s approach. 


b. Because log[ya/(1 — yi)] = xB + vin 


Vill — Yiu) = exp(xinB + Vir) 
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and so 


(a-y) _ 1 = 
Yit exp(xirB + Vir) 
or 
1 = 1 Pa 1+ exp(xirB T Vit) 
Yit exp(xicB + vir) exp(xicB + vir) 
which implies 
js exp(xicB + vi) 


1+ exp(xirB + vir) | 


The ASF is defined, for each f¢, as 


ASF(x,) = [ | Oa | 


-œ| 1+ exp(x/B +v) 


where g;(-) is the density of vx. (Of course, allowing this density to be discrete changes the 


integral to a sum.) We can also write 


exp(x-B + Vit) 
ASF = Bv Dn fe Bagi 
(X:) = E | 1+ exp(x/B + vis) | 


that is, we fix the covariates at values x, and average across the distribution of the 
unobservables, viz. 

c. The ASF cannot be estimated without further assumptions because we cannot estimate 
the expected value of exp(x/B + vj)/[1 + exp(x,B + viz)] for given x; without further 
assumptions. 


d. By the law of iterated expectations, we have 
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ASF,(x;) = Ex, fel 1 + exp(x,B + vi) z }} 


exp(x/B + y + XG + rir) E 
Ba 4E| pee x |}. 


With r; is independent of x; — and we can assume E(a;) = 0 due to the presence of y — then 


we can consistently estimate B, y, and € by pooled OLS: 
Wit = ON Xiz, 1, X;, t= Ve i TEE = 1,...,N. 
(Recall this produces the FE estimator of B.) Further, by independence, 


exp(x,B + y + X6 + rit) 
E ee Ve A aae eae a T OT 
1+ exp(x.B +y +X: + rir) 


x | £ iy exp(x,B +y +X +t) ode 


-o 1+exp(x,B+y+xi€&+r) 


For fixed X; = X, we can consistently estimate this expression as 


a L+exp(x,B + w+ x6 + Fir) i 


where °; = wir — XB — Ọ — Zê are the pooled OLS residuals. To get the ASF, we need to 


further average out over the distribution of x;, which gives 


es ex (xP + ot Xie + Fu) 
ASF, (x;) = NIS, P t a y - = : wu ; 
= L+exp(xB+ yw +X +i) 


We use this as usual: take derivatives and changes with respect to the elements of x;. 


18.16. (Bonus Question) Consider a panel data mode for yy > 0 with multiplicative 


heterogeneity and a multiplicative idiosyncratic error: 


Yit = ci exp(XiB) rit. 
If we assume {x : t = 1,..., 7} is strictly exogenous then we can estimate B using the fixed 


effects Poisson QMLE. Instead, assume we have instruments, {Zx : t = 1,2,..., 7} that satisfy 
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a sequential exogeneity assumption: 
E(ri|Zit,...,Z1,Ci) = Eru) = 1, 
where setting the expected value to unity is a normalization. (As usual, x;, should probably 
include a full set of time period dummies.) 
a. Show that we can write 


PERE Boy St 
exp(xiB) = exp(Xi,41B) 


= Ci = rit) = Cit+1 


where 
E(eiZit,...,Zn) = 0, t= 1,..., 7-1. 


b. Part a implies that we can use the moment conditions 


l Yit Yi t+ 
exp(xiB)  exPX;mb) 


T =O f= S —1 


to estimate B. Explain why using these moments directly can cause computational problems. 
(Hint: Suppose For example, if xi; > 0 for some j and all i and t. What would happen if p; is 
made larger and larger?) 


c. Define the average of the population means across time as 
T 
HL = T= > E(x;). 
r=1 


Show that if you multiply the moment conditions in part b by exp(,B), the resulting moment 


ia | = 0. 


[See Windmeijer (2002, Economics Letters).] How does this help with the computational 


conditions are 


e| Vit _ Vit 
exp[(xi—pH,)B] = exp[(Xin1 — BB] 
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problem in part b? 
d. What would you use in place of p, given that u, is unknown? 


e. Suppose that {x;, : t = 1,2,..., 7} is sequentially exogenenous, so that we can take 


Zit = Xit. Show that 


Vit+1 
E| yu — —— m | Xn... Xi =0,¢=1,...,7-1. 
È ‘expla xB] | | 


In other words, we can write moment conditions in terms of the first difference of the 
explanatory variables. 

Solution 

a. From yy = ci exp(xiB)ri: for all t = 1,..., T we have 


Jit 


exp(xirB) ai 
es) z Cirini, 
exp(X;41B) 


and subtracting the first equation from the second gives 


Yit Yi t+ 


oana aa o Ciu im). 
exp(xiB) = exp(Xi,+1B) (rie — rim) 


Now 


Efc: i — ria) |Zu,.--,Za,c:)| = cLEWuzs,..-,Za,c2) 
= E(rimilZin...,Z1,C:)] 
= ci(1 = 1) =0 


b. Suppose xj > 0 for all i and ¢. Then Bixin > œas p; > œ, which means 


exp(Bixin + Boxi2 +... +BKXuk) > © 


for all i and ¢, for any values of B2,..., Bx. Then 
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Yit E Yit > 0 
exp(xiiB) — exp(Xi+1B) 


as pı > oo, and so the residual function can be made closer and closer to zero by increasing f1 
without bound. 
c. Multiplying the original moment conditions by the exp(y,B) clearly does not does not 


change that they still hold: 


Vit Vi,t+1 
r | cr A 
exp(1,B) | exp(xirB) = exp(Xi418) 


Zins | = 0. 


The left hand side is simply 


exp(u,B)yir expe BY; E Yit Vitel 


exp(xirB) exp(Xi,4+1B) exp[(x—p,)B] — exp[(Kina — WB] 
Using these new moment conditions does not lead to the problem discussed in part b because 
the deviated covariates, X; — p, can take on both negative and positive values. 


d. We would use the sample counterpart, 


rem (wy DB “| = (NT)! x x Xir. 


r=1 i=l rl 
In the sample, the deviated variables, x; — x, will always take on positive and negative values. 
Technically we should account for the estimation error in X but it likely has a minor effect. 
The sample moments we would like to make close to zero have the form 


N T-1 Vi 
de | sor exp a x)B] Expl (iat — x)B] l 


i=1 t=1 


where g, = g,(Zir,...,Zi1) is a function of the instruments up through time ¢. Or, stack the 


moments over time rather than sum them up to enhance efficiency. In either case, we would 
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use GMM with an optimal weighting matrix to set the sample moments as close to zero as 
possible. 
e. If we can take Zy = xi; then we know from part a that 
Elci i — ri+1)|Xit,---,Xi1,€i)]. 
That means any function of (x;,...,Xj;1) can multiply the moment conditions and we are still 


left with a zero conditional mean. In particular, 


Vit Vitel 
E {exp XP) | exp(xiB) — exp(X;n18) l 


noxa} =0,t=1,...,7-1 
and simple algebra shows 


Yit Yi t+1 Yit+ 
exp(X; oF Oooo sds Yi- SS. 
pt D| exp(xiB) — exp(Xi,4+1B) | YH pE — Xir)B] 
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Solutions to Chapter 19 Problems 
19.1. If r; is the same for any random draw i, then it is nonrandom. From equation (19.9), 


we can write 


P(w; = 1|x;) = ®[(P1 + Box +...+B xxix — log(r))/o] 
= O[ (Bi = log(r))/o + (B2/o)x 2 +...+(Bx/o)xix], 


where it is helpful to separate the intercept from the slopes. From this equation, it is clear that 
probit of w; on (1,x2,...,xix) consistently estimates (81 — log(r))/o, B2/o, ..., Bx/o. Let 
aï = (Bi — log(r))/o and define B; = Bj/o,j = 1,...,K. Unfortunately, we cannot recover the 
original parameters because, for example, pı = oa] + log(r), and we do not know ø. Although 
aj is identified, and log(r) is known, we can not recover the scaled intercept 
Bi = Bi/o = aj + log(r)/o. Of course, we directly estimate the scaled slopes, B;/o, and so we 
can estimate the direction of the effects on E(y|x). But we cannot estimate the original 
intercepts or slopes. Assuming f, + 0, we can estimate p;/Pn forj + h, which means we can 
estimate the relative effects. Unlike in the case where the 7; vary, we cannot estimate the 
magnitudes of the partial effects on E(y|x). 

19.2. a. It sufficies to find the density of log(w;) conditional on x;; of course we arrive at 


the same place for the MLEs if we work with D(w;|x;). Now 
log(w;) = max([log(/), log(vi)] 
and log(y;) = x;B + ui, where 
D(u;|x;) = Normal(0,o7). 


Let w; = log(wi), f = log(f), and J; = log(y;), so that D(¥|x;) = Normal(xiB,o2). Now 
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PW; = f\xi) = PO; < fixi) = p( Bee < y 


= p( 4 < Ez |.) = o( 48) 


The conditional density for w > fis simply the conditional density for y;, that is, 


(Pi). 


Therefore, the density for w; conditional on x; can be written as 


CECEN 


It follows that the log likelihood for a random draw i is 


1[wi > fllog| (4)0( =" | + 1[w; > Jio o( 53 ) | 


Notice that when f = 0 we get the same log likelihood as for the Type I Tobit model for corner 


solutions, which we covered in Chapter 17. 


b. Because u is independent of x with a Normal(0,o7) distribution, 
E[exp(u)|x] = E[exp(u)] = exp(o*/2), 
where the second inequality follows from the moments of a lognormal distribution. Therefore, 


E(y|x) = exp(xB)E[exp(w)|x] = exp(xB) exp(o*/2) 
= exp(xB + 07/2). 


After using the MLE on the censored data to obtain B and G2, we can use 
EG|x) = exp(xB + 62/2). 


c. It is hard to see why E(w|x) would be of much interest. In most cases the floor, f, is 
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arbitrary, and so it is unclear why we would be interested in how the mean of the censored 
variable changes with the x;. One could imagine that, if fis a minimum wage and w; represents 
the observed wage for worker i, one might be interested to know how a change in a policy 
variable affects observed wage, on average. 

19.3. a. The two-limit Tobit model from Section 17.7 could be used with limits at 0 and 10. 

b. The lower bound of zero reflects the fact that pension contribution cannot be a negative 
percentage of income. But the upper bound of 10 percent is imposed by law, and is essentially 
arbitrary. If we defined a variable as the desired percentage put into the pension plan, then it 
could range from 0 to 100. So the upper bound of 10 can be viewed as a data censoring 
problem because some individuals presumably would contribute y > 10 if the limit were 
raised. But it depends on the purpose of the study: to estimate the effects within the current 
institutional setting or to estimate effects on pension contributions in the absense of 
constraints. 

c. From Problem 17.3 part b, with a; = 0, we have 

E(yix) = (XB) + {®[(a2 — xB)/o] - O(-xB/o)} 
+ o{b(xB/o) — d[(a2 — xB)/o]} + a2®[(xB - a2)/o]. 


Taking the derivative of this function with respect to a2 gives 


OE()|x)/Oaz = (xP/o) + 6[(a2 — xB)/o] + [(a2 — xB)/o] + [laz — xB)/o] 
+ ®[(xB — a2 )/o] — (a2/o) OL (xB — a2)/0] 
= O[(xB - a2)/0]. 


We can plug in a2 = 10 to obtain the approximate effect of increasing the cap from 10 to 11. 
For a given value of x, we would compute O[(xB — 10)/c], where B and ô are the MLEs. We 


might evaluate this expression at the sample average of x or at other interesting values (such as 
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across gender or race). 

d. If y; < 10 fori = 1,...,N, B and ô are just the usual type I Tobit estimates with lower 
limit at zero: there are no observations that contribute to the third piece in the log likelihood. 

19.4. a. If you are interested in the effects of things like age of the building and 
neighborhood demographics on fire damage, given that a fire has occurred, then there is no 
problem. We simply need a random sample of buildings that actually caught on fire. You 
might want to supplement this with an analysis of the probability that buildings catch fire, 
given building and neighborhood characteristics. But then a two-part (hurdle) model is 
appropriate. 

b. The issue in this case is a bit subtle because it depends on the population of interest. 
One possibility is, at a given point in time, to define the population of interest to be workers 
currently enrolled in a 401(k) plan. Then using a random sample of workers already in a 401(k) 
plan is appropriate. But workers currently enrolled in a plan may not represent those that may 
be enrolled in the future. In fact, we might think of being interested in a scenario where all 
workers are enrolled. It makes sense to think about the sensitivity of contributions to the match 
rate for the population of all workers. Of course, in general, using a random sample of those 
already enrolled leads to a sample selection problem for estimating the parameters for the 
larger population — much like the problem of estimating a wage offer equation (except that, in 
addition to not observing contributions, we would not observe a match rate for those not 
enrolled). 


19.5. Because JO and KWW are both indicators of abil we can write 
IQ = E,abil + a;, KWW = E,abil + ai, 


where 61, 2 > 0. For simplicity, I set the intercepts to zero, as this does not affect the 
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conclusions of the problem. The structural equation is log(wage) = z161 + abil + v. Now, 


given the selection mechanism described in Example 19.4 (IQ is observed if JO +r = 0), we 


can assume that 
E(v|z1, abil, IO, KWW,r) = 0, (19.124) 


which is the standard ignorability assumption with the added assumption that v is unrelated to r 
in the conditional mean sense. To see what else we need, write abil in terms of JQ and a; and 
plug into the structural equation to get 
log(wage) = z1ô1 + ETO +v + ja. 

Now, we want to use KWW as an instrument for JQ in this equation, and use 2SLS on the 
selected sample. The full set of instruments is (z1,K WW). From Theorem 19.1 we need the 
error u = v + &;'a, to satisfy E(u|z1, KWW,s) = 0. Now, because s is a function of /O andr, 
from (19.124) we have E(v|zi, KWW,s) = 0. To ensure E(ai|z1,KWW,s) = 0 we can assume 
E(a1|z1, KWW,r) = 0 or, equivalently, E(ai|z1,a2,r) = 0. The symmetrical assumption on a2 
is E(a2|z1,a1,r) = 0. Loosely, in addition to the errors in the indicator equations being 
uncorrelated, they are also uncorrelated with the selection error. But for all of this to work we 
need to make zero conditional mean assumptions. 

19.6. This is essentially given in equation (19.45), but were we allow the truncation points 
to depend on x;. Let y; given x; have density /(y|x:, P, y), where B is the vector indexing 
E(y;|x;) and y is another set of parameters (often a single variance parameter). Then the 


density of y; given x;,s; = 1, when s; = 1[a1(x;) < yi < a2(x;)], is 


SOx: p, Y) 


7 Feah D-ren aan < y < a(xi). 


POIxi, si = 1) 
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In the Hausman and Wise (1977) study, y; = log(income;), a1(xi) = —«, and a2(x;) was a 
function of family size (which determines the official poverty level). 
19.7. a. If E(uilv2) = yiv2 + 72(v3 — 1) then, because (u1, v2) is independent of x, 
E(v1|X, v2) = X1B, + E(wi|v2) = x18, + y1v2 + v2(v5 —1). 


Now, using iterated expectations (since y2 is a function of (x,v2)), we have 


E(vi|Xy2) = xiB, + y1E(v2|x,y2) + 72{E(v5|x,y2) — 1} 
= xiB, + 71E(v2|x,2) + ¥2{Var(v2|x,y2) + [E(v2|x,2)]* - 1}. 


We only need these expressions for y2 = 1. Using E(v2|v2 > —xd2) = A(x62) and 
Var(v2|v2 > —xd2) =1- A(x82)[A(x62 ) F x6> |, 
we have 


E(Qvi|x,y2 = 1) = xiB, + yi1E(v2|v2 > —x62) + y2Var(v2|v2 > —x62) 
= xiB, +: 71A(x82) + y2{1 — A (x82 )[A (x82) + x82] + [A(x82)]? — 1} 
= xı ßı + y1A(x%62) — 72A(%62) x62. 


b. Now, we obtain x; and Ân after first-stage probit and then run the regression 
Yi ON Xj, ha, hin : (x52) 
using the selected sample. We get consistent estimators of B,,v1, and —y2. 

c. A standard F test of joint significance of Aig and Ân » (x 52) (two restrictions) in the 
regression from part b is a valid test, assuming homoskedasticity in the population structural 
model. As usual, the null is no sample selection bias. 

19.8. If we replace y2 with 2, we need to see what happens when y2 = z82 + v2 is plugged 


into the structural mode: 
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yı = Z101 Tts (z2 + v2) + u1 (19.125) 


Z101 +QA1° (z52) + (u1 + Q1V2). 


So, the procedure is to replace 82 in (19.125) its /N -consistent estimator, 52. The key is to 
note that the error term in (19.125) is uı + æıv2. If the selection correction is going to work 
whn the fitted value is plugged in for y2, we need the expected value of u1 + a@iv2 given (z, v3) 


to be linear in v3 (in particular, it cannot depend on z). Then we can write 
E(yılz, v3) = 2181 + a1 ° (262) + Y1¥3, 
where E[(u1 + @1v2)|v3] = yıv3 by normality. Conditioning on y3 = 1 gives 
E(vi|z,v3 = 1) = 2161 + a1 + (282) + y1A (283). (19.126) 


A sufficient condition for (19.126) is that (u1, v2, v3) is independent of z with a trivariate 
normal distribution. We can get by with less than this, but the nature of v2 is restricted. If we 
use the IV approach — rather than plugging in fitted values — we need assume nothing about v2; 
y2 = 22 + V2 is just a linear projection. 

As a practical matter, if we cannot write y2 = z52 + v2, where v2 is independent of z and 
approximately normal, then the OLS alternative will not be consistent. Thus, equations where 
y2 is binary, or is some other variable that exhibits nonnormality, cannot be consistently 
estimated using the OLS procedure. This is why 2SLS is generally preferred. 

19.9. Here is the Stata session I used to implement Procedure 19.4, although the standard 
errors in the second step are not adjusted to account for the first-stage Tobit estimation. Still, 


¥3 is not statistically significant, and adding it is not really necessary in this application. 


tobit hours exper expersq age kidslt6 kidsge6 nwifeinc motheduc fatheduc 
huseduc, 11(0) 


Tobit regression Number of obs Š 753 
LR chi2(9) = 261.82 
Prob > chi2 = 0.0000 
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Log likelihood = -3823.9826 Pseudo R2 = 0.0331 
hours | Coef Std. Err t P>|t | [95% Conf. Interval 
se Se a i Si Ha a eh et ie +--------------------------------------------------------------- 
exper | 136.9463 17.27271 7.93 0.000 103.0373 170.8554 
expersq | -1.947776 . 5388933 -3.61 0.000 -3.005708 - . 8898433 
age | -54.78118 7.568762 -7.24 0.000 -69.63985 -39.9225 
kidslt6 | -864.3263 111.6246 -7.14 0.000 -1083.463 -645.1896 
kidsge6 | -24.68934 38.77122 -0.64 0.524 -100.8034 51.42468 
nwifeinc | -5.312478 4.572888 -1.16 0.246 -14.28978 3.664822 
motheduc | 24.28791 16.74349 1.45 0.147 -8.582209 57.15803 
fatheduc | 6.566355 16.00859 0.41 0.682 -24.86103 37.99374 
huseduc | 3.129767 17 . 46452 0.18 0.858 -31.15583 37 . 41537 
_cons | 1548.141 437.1192 3.54 0.000 690.0075 2406.275 
je) pe ie EE +--------------------------------------------------------------- 
/sigma | 1126.282 41.77533 1044.271 1208.294 
Obs. summary: 325 left-censored observations at hours<=0 
428 uncensored observations 
© right-censored observations 
predict zd3hat 
(option xb assumed; fitted values) 
sum zd3hat 
Variable | Obs Mean Std. Dev Min Max 
Sea Sa e ae ce! pa, a Sey +-------------------------------------------------------- 
zd3hat | 753 302.7538 814.8662 -2486.756 1933.23 
gen v3hat = hours - zd3hat if hours > 0 
(325 missing values generated) 
ivreg lwage exper expersq v3hat (educ = age kidslt6 kidsge6 nwifeinc 
motheduc fatheduc huseduc) 
Instrumental variables (2SLS) regression 
Source | SS df MS Number of obs = 428 
ae er ee E DE F( 4, 423) = 9.97 
Model | 34.6676357 4 8.66690893 Prob > F = 0.0000 
Residual | 188.659805 423 .446004268 R-squared = 0.1552 
-------------+------------------------------ Adj R-squared = 0.1472 
Total | 223.327441 427 .523015084 Root MSE 66784 
lwage | Coef Std. Err t P>|t | [95% Conf. Interval 
gii en ee eee eee +--------------------------------------------------------------- 
educ | .085618 .0213955 4.00 0.000 . 0435633 .1276726 
exper | .0378509 .0137757 2.75 0.006 .0107734 .0649283 
expersq | -.0007453 . 0004036 -1.85 0.065 - .0015386 . 0000479 
v3hat | -.0000515 . 0000412 -1.25 0.211 - .0001325 . 0000294 
cons | -.1786154 » 2925231 -0.61 0.542 - . 7535954 . 3963645 
Instrumented: educ 
Instruments: exper expersq v3hat age kidslt6 kidsge6 nwifeinc motheduc 


fatheduc huseduc 


If we just use 2SLS on the selected sample without including ¥3, and the IVs for educ are 


motheduc, fatheduc, and huseduc, then the estimated return to education is about 8.0%: 
ivreg lwage exper expersq (educ = motheduc fatheduc huseduc) 


Instrumental variables (2SLS) regression 


Source | SS df MS Number of obs = 428 
Se et ee F( 3, 424) = 11.52 
Model | 33.3927368 3 11.1309123 Prob > F = 0.0000 
Residual | 189.934704 424 .447959208 R-squared = 0.1495 
-------------+------------------------------ Adj R-squared = 0.1435 
Total | 223.327441 427 .523015084 Root MSE = . 6693 

lwage | Coef Std. Err t P>|t | [95% Conf. Interval 

sl ons" a il“ ea a” pm “as ie +--------------------------------------------------------------- 
educ | .0803918 -021774 3.69 0.000 .0375934 .1231901 
exper | - 0430973 .0132649 3.25 0.001 .0170242 . 0691704 
expersq | -.0008628 . 0003962 -2.18 0.030 - .0016415 - .0000841 
_cons | -.1868572 . 2853959 -0.65 0.513 - . 717418242 . 3741097 


Instrumented: educ 
Instruments: exper expersq motheduc fatheduc huseduc 


19.10. a. Substitute the reduced forms for yı and y2 into the third equation: 


y3 = Max(0, a31(261) + @32(Z82) + Z303 + v3) 


max(0, ZT3 + v3), 


where v3 = u3 + @31V1 + @32V2. Under the assumptions given, v3 is independent of z and 
normally distributed. Thus, if we knew 6, and 82, we could consistently estimate a@31,a@32, and 
53 from a Tobit of y3 on 251,262, and z3. From the usual argument, consistent estimators are 
obtained by using initial consistent estimators of 6; and 62. Estimation of 62 is simple: just 
use OLS using the entire sample. Estimation of 6; follows exactly as in Procedure 19.3 using 


the system 


yı = 2i+V1 


y3 = max(0,zm3 + v3), 
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where y1 is observed only when y3 > 0. 

Given 8; and 82, form z,6; and z;5> for each observation i in the sample. Then, obtain 

31,32, and 83 from the Tobit 
Yiz ON (zid1), (zid2), Zi 
using all observations. 

For identification, (z51,z62,z3) can contain no exact linear dependencies. Necessary is that 
there must be at least two elements in z not also in z3. 

Obtaining the correct asymptotic variance matrix is complicated. It is most easily done in a 
generalized method of moments framework. Alternatively, it is easy to use bootstrap 
resampling on both steps of the estimation procedure. 

b. This is not very different from part a. The only difference is that 62 must be estimated 
using Procedure 19.3. Then follow the steps from part a. 

c. We need to estimate the variance of u3, o%, and then use the standard formula for the 
mean of a Tobit model. This gives the ASF as a function of (2,3, z3) and the parameters 
(a31, @32,83,04). 

19.11. a. This follows from the usual iterated expectations argument, because Z; is a 


function of x;: 


E[s:Zjr(wi,8,)] = E{E[s:Zjr(wi, 8, )|xi,5:]} 
= E{s;Z;[r(wi,9,)|xi,s;]} = 0 


because E[r(w;, 9, )|xi,5;] = 0. 


b. We modify equation (14.24) from Chapter 14 to allow for selection: 
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N i N A/N 
ay(Lszin | On Ysa: (Sszine ) 
ET i=1 i=1 


For consistency, we would have to assume that rank E(s;Z;Z;) = L — which means, that in the 
selected sample, the instrument matrix is not perfectly collinear — and we have to assume that 
0, is the unique solution to E[s;Z;r(w;,8,)] = 0. For yN -asymptotic normality, we would 
also have to assume that rank E[s;Z;Ver(w;,8,)] = P, the dimension of 0. None of the 
conditions can be true unless P(s; = 1) > 0, that is, we observe a randomly drawn observation 
with positive probability. But P(s; = 1) > 0 is not nearly sufficient, as we might not have 
identification in the selected population even if we have identification in the full population. 
(For example, we might have an instrument that varies sufficiently in the full population but 
not in the s = 1 subpopulation.) 
c. Let 6 denote the (system) nonlinear 2SLS estimator on the selected sample. For the 

minimum chi-square estimator, we would compute 

N 

Â = N? $ s:Zir:6)r,0) Z, 
i=1 


and then solve 


N i N 
: ! Nae ! 
0 szi ) A > a) 
19.12. a. Take the expected value of (19.56) conditional on (z,y3) : 


Elz, y3) = 2181 + a1E(v2|z,y3) + E(ui|z,y3) 
= 7161 + a1E(2|z, V3) 


because E(u1|z,y3) = 0 follows from E(u1|z, v3) = 0. 
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b. Now take the expected value of (19.56) conditional on (z, v3): 


E(vi|z, v3) = 2161 + a1E(y2|z, v3) + E(ui|z, v3) 


= 718) + a1[Z52 + E(v2|z, v3) | 


= 216) + a1(Z62 + Y2v3). 
Therefore, 
E(vi|Z, y3) = 2161 + @1[Z62 + 72E(13|z, v3) |, 
and when y3 = 1 we get the usual inverse Mill’s ratio: E(v3|z,y3 = 1) = A(z63). So 
E(Qvi|z,v3 = 1) = 2161 + @1[Z62 + 72A(Z83)]. 


c. We can view it as a three-step estimation method. The first step is to obtain 53 from 
probit of y; on z;, using all of the observations. Then, we can estimate 52 and y2 from standard 
Heckit applied to y» using the selection sample. (My initial thought was that the two steps in 
the Heckit method are treated as one, as it could be carried out by partial MLE.) Given 5 3, 52, 
and ¥2, the final stage is the OLS regression 

Ya ON Za, 2/52 + 72A(zid3) 
using the s; = 1 sample. Note that the final regressor, A(zi83), is simply our estimate of 
E(y2|z,v3 = 1). Intuitively, if there is one relevant element in z; not in z;1, then 
E(v2|Z:,¥i3 = 1) has sufficient variation apart from z; to identify 6; and a;. However, I did 
overlook one issue when I wrote this problem: we cannot get a very good estimate of 82, or y2 
for that matter, in the preliminary Heckit unless we can set an element of 52 equal to zero. In 
other words, we would really need an exclusion restriction in the reduced form of y2 in order to 
get a good Heckit estimate of 62. Thus, this procedure seems no better — and perhaps even 


worse — than Procedure 19.2, even when we assume E(w1|z, v3) = 0. 
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If y2 is always observed, then we can estimate 62 by a first-stage OLS regression, and we 
could then estimate y2 precisely, also, without resorting to an exclusion restriction in the 
reduced form of y2. 

d. Unlike Procedure 19.2, the method in part c does not work if E(w1|z, v3) + 0. Therefore, 
there is little to recommend it. 

e. If E(u1|z, v2, v3) = 0, we would just use OLS on the selected sample: yj; on Z;, yi2. 

19.13. a. There is no sample selection problem because, by definition, you have specified 
the distribution of y given x and y > 0. We only need to obtain a random sample from the 
subpopulation with y > 0. 

b. Again, there is no sample selection bias because we have specified the conditional 
expectation for the population of interest. If we have a random sample from that population, 
NLS is generally consistent and JN -asymptotically normal. 

c. We would use a standard probit model. Let w = 1[y > 0]. Then w given x follows a 
probit model with P(w = 1|x) = ®(xy). 

d. EQ|x) = P > O|x) - EQ|x,y > 0) = (xy) - exp(xB). So we would plug in the NLS 
estimator of B and the probit estimator of y. 

e. By definition, there is no sample selection problem when you specify the conditional 
distribution — conditional means — for the second part. As discussed in Section 17.6.3, 
confusion can arise when two part models are specified with unobservables that may be 


correlated, as in equation (17.50): 


y=s-exp(xB+u), 
s = 1[xy +v > 0], 


so that s = 0 < y = 0. As shown in Section 17.6.3, if u and v are correlated then estimation of 
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P does use methods that are closely related to the Heckman sample selection correction. But B 
does not tell us what we need to know because both E(y|x) and E(y|x, y > 0) are much more 
complicated than in the truncated normal or lognormal hurdle cases. See Section 17.6.3 for 
further discussion. 

19.14. a. Write u = ao(1 — s) + ais + e where, by assumption, E(e|z,s) = 0. Plugging this 


expression for u into (19.30) gives 


y = Pit Boxot...+Pxxx+ao(1—s)+ais+e 
E(elz,s) = 0. 


Using the selected sample and applying IV corresponds to multiplying the equation through by 


s, and then applying 2SLS. We have 


Sey = Bis + Bo(s +x2) +...+Bx(s -xx) + Qists-e 


= (a1 + B1)s + Bo(s + x2) +...+Bx(s *xK) +58, 


where we use s(1 — s) = 0 and s? = 1. Because E(s + e|z,s) = 0, it follows that, under the rank 
conditions in Theorem 19.1, 2SLS applied to the selected sample consistently estimates 
(a1 + B1), Bo, ..., Bx. 

b. This is not so much a “show” question as it is just recognizing a basic property of 
conditional expectations: if (u,s) is independent of z, then E(u|z,s) = E(u|s). Because we are 
willing to assume something like independence between u and z (or, at least, a zero conditional 
mean), the important assumption would be independence between s and z. But if the mean of 
the unobservable, u, changes with s, why would we assume that the mean of the exogenous 
observables, E(z|s), does not? Even E(z|s) = E(z) is a strong assumption, let alone full 
indepdence between z and s. 


19.15. a. We cannot use censored Tobit because that requires observing x when whatever 
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the value of y. Instead, we can use truncated Tobit: we use the distribution of y given x and 
y > 0. If we observed x always then using the truncated Normal regression model would be 
inefficient, but censored Tobit for D(y|x) implies truncated Tobit for D(y|x,y > 0). 

b. Because we have assumed y given x follows a standard Tobit, E(x) is the parametric 


function 


EQ|x) = ®(«B/o)xB + of(xP/o). 
Therefore, even though we never observe some elements of x when y = 0, we can still estimate 
E(y|x) because we can estimate B and o and we have an expression for E(|x) that (we assume) 
holds for all x. To estimate B and o? We do have to assume that x varies enough in the 
subpopulation where y > 0, namely, rank E(x'x|y > 0) = K. In the case where an element of x 
is a derived price, we need sufficient price variation for the population that consumes some of 
the good. 


19.16. a. To obtain the expected value of 
yı = 2101 + Q1y2 + U1 


conditional on (z, r2, v2), use the fact that y2 is a function of (z, v2), and use independence of 


(u1, v2) and z: 


Elz, r2, v2) 


Z101 + @1y2 + E(u1|Z, r2, v2) 


Z101 Q1y2 E(uilv2). 


Now use the linearity assumption E(u1|v2) = pıv2 to get 
E(v1|Z, r2, v2) = Z101 +@1y2 + P1V2. 


b. With s2 = 1[y2 < w2], s2 is clearly a function of (Zz, r2, v2), and so s2 is redundant in 


E(Q1|z, 72, v2,5s2): 
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E(Q1|Z,72,V2,82) = E(vi|zZ,r2, v2) = 2181 + @1y2 + p1v2. 


c. Because of part b, if we could observe v; whenever s = 1 we could consistently 
estimate 61, @1, and pı by running the regression 
Ya ON Za, V2, V2 if s2 = 1. 
Naturally, we can replace vj2 = yp — 282 with ¥j2 = yj — zi92 for a consistent estimator 5 of 
52. That estimator should be from a censored normal regression using 
Win = min(r2, Zið2 + V2) 
and then defining 
Po = yo —2ib2 if y2 < rp. 
Then run the regression 
Yü ON Zä, Ya, Va if sz = 1. 
We can use the delta method to obtain valid standard errors, or bootstrap both steps of the 
procedure. A simple test of 
19.17. a. The assumption is that, conditional on (x;,c;), ui is independent of the entire 
history of censoring values, (71,7 i2,...,/ir). This a kind of strict exogeneity assumption on the 
censoring, which rules out the censoring values being related to current or past shocks to y. It 
does allow censoring to be arbitrarily correlated with heterogeneity c;. 


b. Subsititue for y; to get 
Wit = 1[xiB +c; +t Uyu > rit | 


and then substitue for c; to get 
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Wit = 1[xiiB FW 4 xc t Nri +A; +t Un > Fit] 
= 1[(a; + ui) > Tit Xb FW X; t nr i) | 
Z i (a; + ui) 2 Fit Xb FW X; } ia) | 


(03 +07)" (oa +07)" 


Now use the fact that D(a; + ui|x;,r;) ~ Normal(0,02 + 02): 


> B rit — (XP +y + XG + nr) 
Pwi = 1x;r:) = 1- of 2 ee ae) 


= of Xub FW X; t Nri Fit | 


(03 +03)” 


= O(xi.,,, F Y au 4 Xib, H NauTi H Y au¥it) 


where B „„ = B/(o2 +03)", Wau = yili +03), Eu = E03 +03)", and 
Yau = —1/(0} + o2). 
c. From part b, we can estimate all of the scaled coefficients, including y au, by pooled 


probit, provided {x;,} and <r; have time variation for at last some units. But 


B= -B,,,/Yau 
and so we just use 
B= Paa 
d. The pooled estimation from part c only allows us to estimate o2 + o2 and the unscaled 
parameters. If we add the assumption that {ux : t = 1,2,..., Tẹ are independent then 
Cov(a; + Uin, ai + uis) = Var(a;) = 02 for all t + s. We can use a slight modification of 


correlated random effects probit, which takes the idiosyncratic error to have unit variance. To 


this end, write 
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Wit = 1[xiB FW 4 xc t Nri di + Ui > Tit] 


if (xB + y + Xi§ H r) Cega 


Ou Ou 


= 1 Se y n Ni — rit) > giten | 


where ei = UitlOu and gi = a;/o,,. This shows that if we apply the CRE probit model to wi on 
(xi, 1, Xi Fi, ri) we Consistentlye estimate B, = Blow, Wu = W/ou,§, = G/ou, andy, = —1/o, as 
the coefficients and 
Var(g;) = 02/02 

as the heterogeneity variance. Thus, we can recover the original unscaled coefficients, 
o2 = 1/y2, and 

o2 = o3/y2. 

19.18. a. Conditional on y > 0, y follows a truncated normal distribution. So truncated 

normal regression would consistently estimate B and o°. 


b. Because we are claiming that D(y|x) follows a type I Tobit in the population, we use the 


expected value derived from that assumption. Namely, 


EQ|x) = ®(«B/o)xB + of(xP/o). 
and then we compute derivatives and changes with respect to x;, as usual. 
This differs from a hurdle model because we do not have a separate model for P(y = O|x); 
we assume this is also governed by the Tobit model, so P(y = O|x) = 1 — ®(xP/o). 
c. We could not estimate a hurdle model in this case because we have no data when y = 0. 


We have not sampled from that part of the population, and so we cannot estimate a model for 


P(s = 1|x) where s = 1[y > O]. 
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19.19. a. First, if r; = 0 the observation contains no information for estimating the 
distribution D(y;|x;)because then P(w; = O|x;) = 1 regardless of D(y;|x;). So what follows is 
only relevant for r; € {1,2,3,...}. 


For0 <w<v7ri, 
P(w; = wixi) = PQ: = wixi) = f,(wlxi). 
Next, w; = r; if and only if y; = ri, and so 


P(w; = rilx;) =1 — P(y; < ri|Xxi) =1 — P(y; <r; - 1|x;) 
= 1 - F,(ri = 1|x;). 


We can write the conditional density of w; as 
fwwlxir)) = Bwa A -— Fy — Ux)", w = 0,7... 77. 
b. In the Poisson case with an exponential mean, the conditional density of y; is 


frQlxisB) = selepa Mlep BY 


and the cdf is 
1 o ERM] [expt BI 


h=0 


Fy(y|xi) = exp[—exp(x:B)] 


Now just plug this into the general formula in part a. 

c. Maximum likelihood estimators based on censored data are generally not robust to 
misspecification of the underlying population — even when that distribution is in the linear 
exponential family. (The log likelihood for the censored variable is not in the linear 
exponential family; even if it were, E(w,|x;) depends on the underlying distribution.) Just like 


censored regression with a normal distribution is not robust for estimating the mean parameters 


463 


under nonnormality, neither is censored regression with a Poisson distribution. One way to see 
this is to write down the score for the general case and observe that just having E(y;|x;) 
correctly specified will not imply that the score has zero expectation. 

d. As we know from earlier chapters, nonlinear least squares, Poisson QMLE, and other 
QMLEs in the LEF are robust for estimating the mean parameters. Thus, if there were no data 
censoring, we could use the Poisson QMLE to estimate B. With data censoring, E(w;|x;) 
always depends on the underlying population distribution. Thus, in general we need to specify 
D(y;|x;) even if we are primarily interested in E(y;|x;). 

e. Because data censoring requires us to have D(y;|x;) correctly specified, a strong case can 
be made for specifying flexible models for D(y;|x;) — even if we are primarily interested in 
D(y|x:). For example, if we use a NegBin I or NegBin II model, these at least include the 
Poisson as (limiting) special cases. So, if we are pretty sure the underlying population has 
overdispersion, we can use one of these distributions in accounting for the right censoring. 


Ideally we would have a distribution that allows underdispersion, too. 
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Solutions to Chapter 20 Problems 


20.1. a. Just use calculus and set the derivative to zero: 


No 
DP; (wi fiw) = 0 


i=1 


or 
No No No 
Depa wi = Dpr Âw = (Zor Jae 
i=1 i=1 i=1 


Solving for f1,, gives 


No -i No No 

acs ice -1 -1 = 

Hw = > Pi: > Pj; Wi = > YW 
i=l i=1 


i=1 


where 
No =I 
_ -1 -1 
vi = Dp. Pji > 
i=h 


P(s; = 1|zi, wi) = P(s; = 1|z;) = Pizi +... +psZzis 


b. From equation (20.7), 


which implies 
E(sifji,wi) = Dj 
because the stratum for observation i is j; if and only if zy = 1. Now 


N N 
p| v Erp | =N! > El(si/p;,)wil 


j= i=1 


and, by iterated expectations, 
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El(si/p;,)wi] = EXE[(si/p;,)wilji, wil} 
= E{(E(silji, wi)/p;,) wi} 
= E[(p;,/p;,)wil = E(wi) = Ho. 


This shows E(w) = Lo. 

c. Notice that ù» depends on N, the number of times we sampled the population, including 
when we did not record the observation. By contrast, to obtain A, we need only need 
information on the sampling weights and data on the units actually kept. Therefore, in addition 
to knowing the sampling probabilities, ù» requires the extra information that we know how 
many observations were discarded by the VP sampling scheme. 

20.2. Write the log likelihood for all N observations as 


>) zinlsilog(pa) + (1 - si) log(1 - pa)]. 


1 


N 
i=1 


J 
h= 


For a given j € {1,2,...,J}, take the derivative with respect to p;, and set the result to zero: 
N 
5 E (1—s:) | - 
zj = -> |=0 
Pi (1p) 
or, by obtaining a common denominator, 


Sals Psi -PU 2]so 


= pj - py) 


Of course, the problem only makes sense for interior solutions 0 < p; < 1 so the first order 
condition is equivalent to 
N 
SOIA -zys -Êz - 1)] = 0. 


i=1 


Simple algebra gives 
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or 


20.3. a. To be specific, consider the case of variable probility sampling, where the 


probability weights are 
D(zi) = PizZi +... +PJZiJ = P(s; = 1|z;, w;) 


where w; = (x; yi). We can write the IPW nonliner least squares objective function as 


N 
$ S; 

min ‘Ty, — m(x;,0)]?, 
nit 2 play 0 mi 8) 


which is for form useful for studying asymptotic properties. (For the asymptotic distribution 
theory, we divide the objective function by two to make the notation easier.) 


b. For VP sampling, we have already assumed that each p; > 0, and, because we can write 


|si/p(zi)|S max(pj',...,P7") 
it follows that the regularity conditions sufficient for consistency of NLS on a random sample 
are also sufficient for NLS on a VP sample: the objective function is still continuous and the 


moment conditions do not need to be changed because 


oe) bi = m(x;,9)]? = max(p;',...,p7' [vi = m(x;,8)]?. 


Further, we know generally that 


Si ae , 2 an ee! : 2 
e-i a mex} E{[yi — m(x;,6)]2} 
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and so if 8, uniquely minimizes the right hand side, it uniquely minizes the left hand side, too. 
c. The theory in Section 19.8 can be applied directly. In particular, we can use equation 


(19.90) because the probabilities are known, not estimated. In the formula, 


Ao = E[H(wi,8.)} = E[A(x:,9.)] 
= E[Vom(x;, 9.)'Vem(x;,9.)] 


and 


B, - efi 7 Vag (xXis2)'Voq(xi,9o } 


~Vom(xi,0 2'Vam(x:,8.)} 


= pee 


where u; = yi — m(x;,9.,). We can consistently estimate A, as 


N 


Aa NDI] zV 859'Vom | 


and B, as 


N 


B= Sl Ge it SE z Vem(x:,0 6,,.)'Vom(x:,6 6.) | 


where ú; = yi — m(Xx;, Ow) are the residuals. Then 


Avar(6,,) = = A'BA™IN 


(which does not actually require knowing N, as it cancels everywhere). 

d. The formula does not generally simplify because E(u?|x;,z;) might depend on z; even if 
Var(u;|x;) = 02. [In fact, we do not even assume that E(u;|x;,z;) = 0 in this problem because 
the stratification may be endogenous. ] 


e. If m(x,8) is misspecified we must use a more general estimator for A* based on 
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A* = E[H(w,,6*)] = E[Vom(x:,0*)'Vom(x:,0*) —u7Vam(xi,0*)] 
where @* is the pseudo-true value of 8 that solves the population minimization problem and 


u* = y; — m(x;,9*). Our estimator of A* is 


N 
A = NSE [Vom Â, Vome Ân) = Vem] 
= i 
The estimator of B* can be the same as in part c. 
20.4. First, we can write the unweighted objective function as 


JN; J Nj 
NAD DE a0v7,8) = SOWN); > a0, 8) 


j=l i=1 i=1 


j=1 
F Nj 

= Sal Yan) ) 
j=l 


i=l 
as suggested in the hint. Further, by the same argument as on page 860, N;' y q(wi, 9) 
converges (uniformly) to E[g(w, 8)|w € W;] = E[g(w,®)|x € +], where we use the fact that 
the strata are determined by the conditioning variables and given by 11, ..., Æj. Therefore, if 
H; > H; as N > œ the unweighted objective function converges uniformly to 


H E[q(w,0)|x € X1] +... +4 E[q(w,0)|x € £7] (20.97) 
Given that @, solves (20.15) for each x, we can also show @, minimizes E[q(w,0)|x € &;] 
over © for each j: by iterated expectations (since the indicator 1[x € Vj] is a function of x), 
E[q(w,0)|x € 4] = EXE[q(w, ®)|x]|x € 47}, 
and if 0, minimizes E[g(w, 9)|x], it must also minimize E{E[q(w, 6)|x]|x € +}. Therefore, 0, 
is one minimizer of (20.97) over ©. Now we just have to show it is the unique minimizer if it 


uniquely minimizes E[q(w,0)]. Without the assumption H; > 0, 0, need not be the unique 
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minimizer of (20.97). To show uniqueness when each H; is strictly positive, let s; = 1[x € %;]. 
Then we can write, for any 0, 
J 
E[q(w, 8)] - Elg(w,8,)] = >) Q/{ELg(w, 8)|s/] — Elq(w, 8, )|ss]}, 

j=l 
where the Q; are the population frequencies. By assumption, the left hand side is strictly 
positive when @ + 0o, which means, because Q; > 0 for all j, EL g(w, 8)|s;] — ELg(w, 9, )|s/] 
must be strictly positive for at least one j; we already know that each difference is nonnegative. 
This, along with the fact that H; > 0, j = 1,...,J, implies that (20.97) is uniquely minimized 
at 0.. 

20.5. a. The Stata output is given below. The variables with “bar” added on denote the 
district-level averages. Note that we can still use xt reg even though this is a cluster sample, 
not a panel data set. An alternative for obtaining the FE estimates is the areg command in 
Stata. The pooled OLS and FE estimates are identical on all explanatory variables. The pooled 
OLS standard errors reported below are almost certainly incorrect because the assume no 


within-district correlation in the unobservables. 


reg lavgsal bs lstaff lenroll lunch bsbar lstaffbar lenrollbar lunchbar 
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Source | SS df MS Number of obs = 1848 
Se ee et ee ee F( 8, 1839) = 228.60 
Model | 49.9510474 8 6.24388093 Prob > F = 0.0000 
Residual | 50.2303314 1839 .027313938 R-squared = 0.4986 
-------------+------------------------------ Adj R-squared = 0.4964 
Total | 100.181379 1847 .054240054 Root MSE = 16527 
lavgsal | Coef Std. Err t P>|t | [95% Conf. Interval 
sms ai a pe aa la a, R ai +--------------------------------------------------------------- 
bs | -.4948449 . 2199466 -2.25 0.025 .9262162 - .0634736 
lstaff | -.6218901 .0277027 -22.45 0.000 .6762221 -.5675581 
lenroll | -.0515063 .0155411 -3.31 0.001 .0819865 - .0210262 
lunch | . 0005138 . 0003452 1.49 0.137 . 0001632 . 0011908 
bsbar | . 441438 . 2630336 1.68 0.093 - .074438 .9573139 
lstaffbar | -.1493942 .0370985 -4.03 0.000 . 2221538 - .0766346 
lenrollbar | .0315714 .0184565 1.71 0.087 . 0046266 .0677694 


lunchbar | -.0016765 . 0003903 -4.30 0.000 - .0024419 - .000911 
_cons | 13.98544 .141118 99.10 0.000 13.70867 14.26221 


xtreg lavgsal bs lstaff lenroll lunch, fe 


Fixed-effects (within) regression Number of obs = 1848 
Group variable: distid Number of groups = 537 
R-sq: within = 0.5486 Obs per group: min = 
between = 0.3544 avg = 3. 
overall = 0.4567 max = 162 
F(4,1307) = 397.05 
corr(u_i, Xb) = 0.1433 Prob > F = 0.0000 
lavgsal | Coef Std. Err t P>|t | [95% Conf. Interval 
Sem lal lh a Newt “Se, a a +--------------------------------------------------------------- 
bs | -.4948449 . 133039 -3.72 0.000 - . 7558382 - . 2338515 
lstaff | -.6218901 .0167565 -37.11 0.000 -.6547627 - .5890175 
lenroll | -.0515063 . 0094004 -5.48 0.000 - .0699478 - .0330648 
lunch | .0005138 .0002088 2.46 0.014 . 0001042 . 0009234 
cons | 13.61783 .1133406 120.15 0.000 13.39548 13.84018 
sd) a t a ase a i a +--------------------------------------------------------------- 
sigma_u | .15491886 
Sigma_e | .09996638 
rho | .70602068 (fraction of variance due to u_i) 
F test that all u_i=0: F(536, 1307) = 7.24 Prob > F = 0.0000 


b. The RE estimates are given below, with and without cluster-robust standard errors. Also, 
the cluster-robust standard errors for FE are provided. The fully robust standard errors are 
bigger than the nonrobust ones, suggesting there might be additional within-district correlation 


even after accounting for an additive districte effect. 


xtreg lavgsal bs lstaff lenroll lunch bsbar lstaffbar lenrollbar lunchbar, 


re 

Random-effects GLS regression Number of obs = 1848 
Group variable: distid Number of groups = 537 
R-sq: within = 0.5486 Obs per group: min = 

between = 0.4006 avg = 3. 

overall = 0.4831 max = 162 
Random effects u_i ~Gaussian Wald chi2(8) = 1943.89 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 

lavgsal | Coef Std. Err. Zz P>|z | [95% Conf. Interval 
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bs | 

lstaff | 
lenroll | 
lunch | 
bsbar | 
lstaffbar | 
lenrollbar | 
lunchbar | 
_cons | 


- .4948449 
- .6218901 
- .0515063 

. 0005138 


. 1334822 
.0168123 
.0094317 
.0002095 


OO0OO0O0O0O00O0O®O 
N 
H 
O 


- . 7564652 
- .6548417 
- .0699921 

. 0001032 


- . 2332245 
- .5889385 
- .0330205 

.0009244 


.12627558 
.09996638 
.61473634 


xtreg lavgsal bs lstaff lenroll lunch bsbar lstaffbar 
re cluster (distid) 


Random-effects 


GLS regression 


Group variable: 


within 
between 
overall 


R-sq: 


Random effects 
corr(u_i, X) 


bs | 

lstaff | 
lenroll | 
lunch | 
bsbar | 
lstaffbar | 
lenrollbar | 
lunchbar | 
_cons | 


sigma_u | 
sigma_e | 
rho | 


u 


distid 


0.5486 
0.4006 
0.4831 


_i ~Gaussian 


= 0 (assumed) 


lenrollbar lunchbar, 


Number of obs 
Number of groups 


Obs per group: 


min 
avg 
max 


Wald chi2(8) 
Prob > chi2 


1848 
537 


3. 
162 
556.49 
0.0000 


(Std. Err. adjusted for 537 clusters in distid 


Robust 


Std. Err. 


- .4948449 
- .6218901 
- .0515063 
. 0005138 
. 2998553 
- .0255493 
.0657286 
- .0007259 
13.22003 


. 1939422 
.0432281 
.013103 
.000213 
. 3031961 
.0651932 
.020655 
.0004378 
. 2556139 


- .8 749646 
- . 7066157 
- .0771876 
. 0000964 
- .2943981 
- . 1533256 
.0252455 
- .0015839 
12.71904 


- .1147252 
- .5371645 
- .025825 
. 0009312 
.8941087 
. 1022269 
. 1062116 
.0001322 
13.72103 


.12627558 
.09996638 
.61473634 


xtreg lavgsal bs lstaff lenroll lunch, fe cluster(distid) 


Fixed-effects (within) regression 
Group variable: distid 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


0.5486 
0.3544 
0.4567 


0.1433 
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Number of obs 
Number of groups 


Obs per group: min 


F(4,536) 
Prob > F 


avg 
max 


(Std. Err. adjusted for 537 clusters in distid 


Robust 
lavgsal | Coef Std. Err t P>|t | [95% Conf. Interval 
ee ee eee ee +--------------------------------------------------------------- 
bs | -.4948449 .1937316 -2.55 0.011 - .8754112 -.1142785 
lstaff | -.6218901 .0431812 -14.40 0.000 - . 7067152 - .5370649 
lenroll | -.0515063 .0130887 -3.94 0.000 -.0772178 - .0257948 
lunch | . 0005138 .0002127 2.42 0.016 . 0000959 . 0009317 
cons | 13.61783 . 2413169 56.43 0.000 13.14379 14.09187 
aS ems E E +--------------------------------------------------------------- 
sigma_u | .15491886 
sigma_e | .09996638 
rho | .70602068 (fraction of variance due to u_i) 


c. The robust Wald test for joint significance of the four district-level averages gives a 
strong rejection of the null, with p-value = .0004. Therefore, we conclude that at least some of 


the variables are correlated with unobserved district effects. 


qui xtreg lavgsal bs lstaff lenroll lunch bsbar lstaffbar lenrollbar lunchbar 
re cluster(distid) 


test bsbar lstaffbar lenrollbar lunchbar 


( 1) bsbar = 0 
( 2) Ilstaffbar = 0 
( 3) Jlenrollbar = 0 
( 4) lunchbar = 0 
chi2( 4) = 20.70 
Prob > chi2 = 0.0004 


20.6. a. Only three schools in the sample have reported benefits/salary ratios of at least . 5. 


The highest of these is about . 66. 


count if bs >= .5 
3 


list distid bs if bs >= .5 


f------------------- + 
| distid bs | 

a Sietahe. Sele SSE aloe. eS u l 

68. | 9030 .6594882 | 
1127. | 63160 .5747756 | 
1670. | 82040 .5022581 | 
f------------------- + 
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b. The magnitude of the coefficient on bs falls somewhat and the cluster-robust standard 


error increases substantially from about . 194 to .245, likely due to the reduction in variation of 


bs within the three districts listed in part a. There is still some evidence of a salary-benefits 


tradeoff. 


xtreg lavgsal bs lstaff lenroll lunch if bs < .5, fe cluster(distid) 


Fixed-effects (within) regression 
Group variable: distid 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


bs | 
lstaff | 
lenroll | 
lunch | 

| 


sigma_u | 
sigma_e | 
rho | 


0.5474 
0.3552 
0.4567 


0.1452 


Robust 


Number of obs 


Number of groups 


F(4, 536) 
Prob > F 


Obs per group: min 


avg 


max = 


oll 
H 
00 
& 
o1 


= 3. 
162 


= 58.06 


= 0.0000 


Err. adjusted for 537 clusters in distid 


- .4560107 
- .6226836 
- .0518125 


.0005151 
13.6096 


. 245449 
.0431074 
.0131213 
.0002157 
. 2466242 


- .9381705 
- . 7073637 
- .077588 
. 0000913 
13.12513 


0261492 
- .5380035 
- .026037 
. 0009389 
14.09407 


. 15486353 
. 10003476 
. 70558835 


c. The LAD estimates below give a point estimate that indicates a tradeoff but it is smaller 


in magnitude than in part a or part b. The standard error, which is not robust to cluster 


correlation, implies £ = —1. 22. Therefore, using LAD, there is little evidence of a tradeoff. 


greg lavgsal bs lstaff lenroll lunch bsbar lstaffbar lenrollbar 


Median regression 


Raw sum of deviations 334.3106 (about 10.482654) 


Min sum of deviations 234.7491 


Number of obs 


lunchbar 


= 1848 


ll 
© 
N 
© 
N 
© 


bs | 
lstaff | 
lenroll | 


- .3066784 


- .6555687 
- .0635032 


2511169 
. 0313058 
-017727 


- . 7991826 
- . 7169673 
- .0982703 


. 1858258 
- .59417 
- .028736 


lunch | . 0005538 . 0003954 
bsbar | . 3679283 . 3003398 
lstaffbar | -.1374073 -0421226 
lenrollbar | 0075581 0210143 
lunchbar | -.0014894 . 0004477 

| 


_cons 14.23874 . 1612496 


- .0002217 
-.2211145 
- .2200204 
- .0336564 
- .0023675 
13.92249 


.0013293 
.9569712 
- .0547941 
0487726 
- .0006113 
14.55499 


20.7. a. Out of 1,683 schools, 922 have all five years of data. The fewest number of years is 


three. Note that the tab command gives includes many more observations than schools 


because there are multiple years per school. 


xtsum math4 


Variable | Mean Std. Dev. Min Max | Observations 
ae a ea a en lee Gp a Si”, fee eee ee ee er rr rr re er re rr re er re er ee re er ee ee ee ee epee er ee ee ee eee ee 
math4 overall | 63.57726 20.19047 2.9 100 | N = 7150 
between | 16.08074 11.75 98.94 | n = 1683 
within | 12.37335 13.71059 122.3439 | T-bar = 4.24837 
egen tobs = sum(1), by(schid) 
count if tobs == 5 & y98 
922 
tab tobs 
tobs | Freq. Percent Cum 
seh a hs Sp He le eh“ +----------------------------------- 
3 | 1,512 21.15 21.15 
4 | 1,028 14.38 35.52 
5 | 4,610 64.48 100.00 
sah om i, > tt a el A a +----------------------------------- 
Total | 7,150 100.00 


b. The pooled OLS estimates, with all time averages included, and the fixed effects 


estimates — with so-called “school fixed effects” — are given below. Variables with a “b” at the 


end are the within-school time averages. As expected, they are identical, including the 


coefficients on the year dummies. 


The coefficient on /unchb is —. 426, and its fully robust ¢ statistic is —11. 76. Therefore, the 


average poverty level over the available years has a very large effect on the math pass rate: a 


ten percentage point increase in the average poverty rate predicts a pass rate that is about 4.3 


percentage points lower. 
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reg math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb lunchb lenrolb 
y95b y96b y97b y98b, cluster(distid) 


Linear regression 


lavgrexp 
lunch 
lenrol 
y95 

y96 

y97 

y98 
lavgrexpb 
lunchb 
lenrolb 


Number of obs 


F( 14, 


Prob > F 
R-squared 
Root MSE 


466) 


= 7150 
182.55 
0.0000 
= 0.4147 

15.462 


(Std. Err. adjusted for 467 clusters in distid 


Std. Err. 


6.288376 
- .0215072 
-2.038461 
11.6192 
13.05561 
10.14771 
23.41404 
2.7178 

- .4256461 
. 2880016 
21.26329 
15.69885 
20.66597 
-8.501184 
-6.616139 


3.13387 
.0399402 


. 1301085 
- .0999924 


21.39431 
-5.224258 
- .4967642 

-3.9805 
-10.09639 
2.879602 
-10.20536 
-45.63248 
-55.89125 


12.44664 

.056978 
2.087466 
13.03679 

14.8893 
12.03046 
25.43377 
10.65986 
- . 3545279 
4.556503 
52.62297 

28.5181 

51.5373 
28.63011 
42.65897 


xtreg math4 lavgrexp lunch lenrol y95 y96 y97 


Fixed-effects (within) regression 


Group variable: schid 


R-sq: 


within 
between 
overall 


corr(u_i, Xb) 


0.3602 
0.0292 
= 0.1514 


= 0.0073 


Err. 


y98, fe cluster(distid) 


Number of obs 


Number of groups 


Obs per group: min 


avg 
max 


F(7,466) 
Prob > F 


259.90 
0.0000 


adjusted for 467 clusters in distid 


lavgrexp 
lunch 
lenrol 
y95 


sigma_u 
sigma_e 


Std. Err. 


6.288376 
- .0215072 
-2.038461 

11.6192 


3.132334 
0399206 
2.098607 
. 7210398 
9326851 
.9576417 
1.027313 
32.68429 


.1331271 
- .0999539 
-6.162365 

10.20231 

11.22282 

8.26588 

21.3953 
-52.38262 


12.44363 
.0569395 
2.085443 

13.0361 

14.8884 
12.02954 
25.43278 
76.07107 


15.84958 
11.325028 
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rho | .66200804 (fraction of variance due to u_i) 


c. The RE estimates are given below, and they are identical to the FE estimates. The RE 
coefficients on the time averaegs are not identical to those for POLS. In particular, on /unchb, 
the RE coefficient is —. 415, just slightly smaller in magnitude than the POLS estimate. It has a 


slightly smaller fully robust ¢ statistic (in absolute value). 


xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb lunchb lenrolb 
y95b y96b y97b y98b, re cluster(distid) 


Random-effects GLS regression Number of obs 7 7150 
Group variable: schid Number of groups = 1683 
R-sq: within = 0.3602 Obs per group: min = 
between = 0.4366 avg = 4. 
overall = 0.4146 max = 
Random effects u_i ~Gaussian Wald chi2(14) = 2532.10 
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 


(Std. Err. adjusted for 467 clusters in distid 


Robust 

math4 | Coef Std. Err Z P>|z | [95% Conf. Interval 

Het as + Sa a ee! pans ey! +--------------------------------------------------------------- 
lavgrexp | 6.288376 3.13387 2.01 0.045 . 1461029 12.43065 
lunch | -.0215072 .0399402 -0.54 0.590 - .0997886 .0567741 
lenrol | -2.038461 2.099636 -0.97 0.332 -6.153671 2.07675 
y95 | 11.6192 . 7213934 16.11 0.000 10.2053 13.03311 
y96 | 13.05561 . 9331425 13.99 0.000 11.22668 14.88453 
y97 | 10.14771 .9581113 10.59 0.000 8.269847 12.02557 
y98 | 23.41404 1.027817 22.78 0.000 21.39956 25.42852 
lavgrexpb | 2.569862 3.99586 0.64 0.520 -5.261881 10.4016 
lunchb | -.4153413 . 0363218 -11.44 0.000 - . 4865308 - .3441518 
lenrolb | . 3829623 2.157847 0.18 0.859 -3.84634 4.612264 
y95b | 18.96418 15.24131 1.24 0.213 -10.90824 48 .83659 
y96b | 16.16473 6.628049 2.44 0.015 3.173993 29.15547 
y97b | 17.50964 15.42539 1.14 0.256 -12.72357 47 . 714285 
y98b | -9.420143 18.25294 -0.52 0.606 -45.19524 26.35495 
cons | -5.159784 24.08649 -0.21 0.830 -52.36844 42.04887 

felt i ar pd ty E +--------------------------------------------------------------- 

sigma_u 10.702446 


| 
Sigma_e | 11.325028 
| .47175866 (fraction of variance due to u_i) 


d. When we drop the time averages of the year dummies the RE estimates are slightly 
different from the FE estimates. That is because we must now recognize that, with an 


unbalanced panel, the time averages of the year dummies are no longer constant. With a 
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balanced panel, the time average are 1/7 in each case. Now, the average is either zero — if the 


unit does not appear in the appropriate year — or 1/7; where 7; is the total number of years for 


unit (school) 7. For example, the list command below shows that the school with identifier 


number 557 has data for the years 1994, 1997, and 1998. Therefore, y95b and y96b are both 


zero, while y97b and y98b are both 1/3. With an unbalanced panel, we should include the time 


averages of the year dummies. In effect, this is allowing certain forms of sample selection to be 


correlated with the unobserved school heterogeneity. 


xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb lunchb lenrolb, 


re cluster(distid) 


Random-effects 


GLS regression 


Group variable: 


within 
between 
overall 


R-sq: 


Random effects 
corr(u_i, X) 


schid 


0.3602 
0.4291 
0.4105 


_i ~Gaussian 


= 0 (assumed) 


Number of obs 
Number of groups 


Obs per group: 


min 
avg 
max 


Wald chi2(10) 
Prob > chi2 


7150 
1683 


2073.48 
0.0000 


(Std. Err. adjusted for 467 clusters in distid 


12.3412 
0578926 
1.998325 
13.17456 
14.95896 
12.06665 
25.53275 
10.03618 
- .3372155 

4.93222 
51.18908 


Robust 
math4 | Coef Std. Err. Z P>|z | 
seins ci a “ae ee ee +--------------------------------------------------------------- 
lavgrexp | 6.222429 3.121881 1.99 0.046 . 1036546 
lunch | -.0209812 .0402425 -0.52 0.602 - .099855 
lenrol | -2.06064 2.070938 -1.00 0.320 -6.119604 
y95 | 11.78595 . 7084874 16.64 0.000 10.39734 
y96 | 13.16626 .91466 14.39 0.000 11.37356 
y97 | 10.21612 .9441691 10.82 0.000 8.365579 
y98 | 23.46409 1.055457 22.23 0.000 21.39544 
lavgrexpb | 2.417603 3.887099 0.62 0.534 -5.20097 
lunchb | -.4088571 .0365525 -11.19 0.000 - . 4804986 
lenrolb | . 7979708 2.109349 0.38 0.705 -3.336278 
_cons | 2.619295 24.78096 0.11 0.916 -45.95049 
Som “Sl EE et +--------------------------------------------------------------- 
sigma_u | 10.702446 
sigma_e | 11.325028 
rho | .47175866 (fraction of variance due to u_i) 
list schid year y95b y96b y97b y98b if schid == 557 
+---------------------------------------------------- + 
| schid year y95b y96b y97b y98b | 


740. | 557 1994 0 0 . 33333333 . 33333333 | 
741. | 557 1997 (0) 0 . 33333333 . 33333333 | 
742. | 557 1998 0 (0) . 33333333 . 33333333 | 

+---------------------------------------------------- + 


e. The FE estimates without the year dummies are given below. The coefficient on the 


spending variable is more than seven times larger than when the year dummies are included. 


The estimate without the year dummies is very misleading. During this period in Michigan, 


spending was increasing and, at the same time, the definition of a passing score was changed 


so that more students passed the exam. Thus, without controlling for time dummies, most of 


the relationship between pass rates and spending is spurious. 


xtreg math4 lavgrexp lunch lenrol, fe cluster(distid) 


Fixed-effects (within) regression Number of obs = 7150 
Group variable: schid Number of groups = 1683 
R-sq: within = 0.1632 Obs per group: min = 
between = 0.0001 avg = 4. 
overall = 0.0233 max = 
F(3,466) = 136.54 
corr(u_i, Xb) = -0.3272 Prob > F z 0.0000 
(Std. Err. adjusted for 467 clusters in distid 
| Robust 
math4 | Coef Std. Err t P>|t | [95% Conf. Interval 
Se “yt ll al” a “a a i +--------------------------------------------------------------- 
lavgrexp | 45.00103 2.452645 18.35 0.000 40.18141 49.82064 
lunch | .0179948 0377204 0.48 0.634 - .0561284 .092118 
lenrol | -2.372125 3.403866 -0.70 0.486 -9.060952 4.316701 
_cons | -294.8467 32.11083 -9.18 0.000 -357 .9467 -231.7468 
a ae" ta yl ls ay a +--------------------------------------------------------------- 
sigma_u | 17.573721 
sigma_e | 12.9465 
rho | .64820501 (fraction of variance due to u_i) 


f. The POLS and RE estimates, without the time averages, are given below. The spending 


effects are larger than FE and the effect of the /unch variable are much larger. If we do not 


remove the school effect — of which a large component is demographics that do not change 


over time — then the poverty measure /unch becomes very important. From the POLS/RE 
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estimates with the time averages included, it is really the average poverty level over several 


years that has the most predictive power. Of course, the /unch variable does not vary across 


time nearly as much as it does across school. Therefore, using FE, it is difficult to separate the 


effect of lunch; from c;. 


reg math4 lavgrexp lunch lenrol y95 y96 y97 y98, cluster(distid) 


Linear regression 


lavgrexp 
lunch 
lenrol 
y95 


Number of obs 


F( 


7, 466 


Prob > F 
R-squared 
Root MSE 


) 


7150 
256.84 
= 0.0000 
= 0.4029 
15.609 


(Std. Err. adjusted for 467 clusters in distid 


8.628338 
- .4255479 
-1.294046 
12.09916 


2.488897 
.0391249 
1.149539 
. 8909378 


3.737487 
- .5024309 
-3.552969 

10.34841 

10.85308 

8.104584 

21.03519 
-42.62005 


13.51919 
- . 3486648 
. 9648762 
13.84992 


xtreg math4 lavgrexp lunch lenrol y95 


Rand 
Grou 


R-sq: 


Rand 
corr 


om-effects 


GLS regression 


p variable: 


within 
between 
overall 


om effects 
(u_i, X) 


lavgrexp | 
lunch | 
lenrol | 
y95 | 

y96 | 

y97 | 

y98 | 


u 


schid 


0.3455 
0.4288 
0.4016 


_i ~Gaussian 


= 0 (assumed) 


y96 y97 


y98, 


Number of obs 
Number of groups 


Obs per group: 


min 
avg 
max 


wald chi2(7) = 


Prob > chi2 


re cluster(distid) 


1886.18 
0.0000 


(Std. Err. adjusted for 467 clusters in distid 


7.838068 
- .3785643 
-1.391074 

11.66598 

12.88762 

10.18776 

23.53236 


2.157833 
.0400361 
. 9449022 
. 7704663 
.9420724 

. 896855 
1.029968 
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O00000 


3.608793 
- .4570336 
-3.243048 

10.1559 

11.04119 

8.429958 

21.51366 


12.06734 
- . 3000949 
. 4609008 
13.17607 
14.73404 
11.94557 
25.55106 


sigma_u | 
sigma_e | 
rho | 


10.702446 
11.325028 
.47175866 


20.06401 


0.41 


-31.158 


47 . 49148 


g. It seems pretty clear we need to go with the FE estimate and its standard error robust to 


serial correlation within school and cluster correlation within district. Removing a school 


effect most likely gives us the least biased estimator of school spending. Clustering at the 


district level, rather than just at the school level, increases the standard error to 3.13 from 


about 2.43, and so it seems prudent to use the standard error clustered at the district level. 


xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98, fe cluster(schid) 


Fixed-effects (within) regression 


Group variable 


within 
between 
overall 


R-sq: 


corr(u_i, Xb) 


: schid 


0.3602 
0.0292 
= 0.1514 


= 0.0073 


Number of obs 
Number of groups 


Obs per group: min 


avg 
max 


431.08 
0.0000 


adjusted for 1683 clusters in schid 


lavgrexp | 
lunch | 
lenrol | 
y95 | 
| 

| 

| 

| 


6.288376 
- .0215072 
-2.038461 

11.6192 
13.05561 
10.14771 
23.41404 
11.84422 


2.431317 
.0390732 
1.789094 
. 5358469 
.6910815 
. 7326314 
. 7669553 
25.16643 


F(7,1682) 
Prob > F 
Err. 

t P>|t | 
2.59 0.010 
-0.55 0.582 
-1.14 0.255 
21.68 0.000 
18.89 0.000 
13.85 0.000 
30.53 0.000 
0.47 0.638 


1.519651 
- .0981445 
-5.547545 


11.0571 
.05513 
1.470623 
12.6702 
14.41108 
11.58468 
24.91833 
61.20503 


15.84958 
11.325028 
. 66200804 


20.8. a. The information contained in (Xg, Zg, Cg) and (Xg, Ze,ag) is the same, and so if we 


substitute for cg we have 
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E(Vem|Xg,Zg,Cg) = D(A + XgB + ZgmY + Cg) 
= P(A + XP + ZemY + Ng + Ze, + ag) 
= EQ gnlXg, Ze, ag). 


b. Mechanically, we can get E(Vgn|X¢, Zg, ag) = EW em|Xg, Ze, ag) from 


f tle F Xg + ZomY + Ng + ZeG, + dg +u > O]o(ujdu 


where @(-) is the standard normal distribution. If we want E em|Xg, Ze) then we integrate out 
ag with respect to the Normal(0,72) distribution. Just as in the probit case this the same as 


computing 


E(1[a + Xg + ZgmY + Ne + ZG, + Ag + Ugm > OlXg, Zg) 


where (ag + ugm) is Normal(0, 1 + t4) and independent of (xg, Z,). Therefore, 


(a + XgB + ZemY¥ + Ng + ZG.) 
EU anti Z) = O| SRE tet eee | 


Notice that @ can just be absorbed into ng. 

c. Under the asymptotic scheme where G — © and the M, are fixed, there is an upper 
bound, say M, with M, < M for all g. If we see relatively few group sizes — and lots of data per 
group size — we can allow the parameters to be different for each Mg, with an appropriate 


normalization. For example, we can have 


(Nm, + XeB + ZemY + 226 y) 
EQ emlXg,Zg) = of a ann oi . 


(1+ tu” 


where Ti, is set to zero for one value, such as tî, = 0. We can easily estimate all of the 
parameters using the quasi-log likelihood associated with a “heteroskedastic probit,” where we 


include in the heteroskedasticity function dummy variables for all but one outcome on Mg. 
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And, of course, we include an intercept and dummy variables in the index as well as Zg and 
interactions with the group-size dummies. 

d. If we use the Bernoulli QMLE with the mean function discussed in part c, we need to be 
sure that the inference is robust both to the true distribution not being Bernoulli and the 


within-cluster correlation. 
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Solutions to Chapter 21 Problems 

21.1. a. We use equation (21.5). First, because we have a random sample from the 
treatment and control groups, E(¥1) = EQ)|w = 1) and E(o) = E(Qy|w = 0). Therefore, by 
equation (21.5), 

EQ — 30) = [EQolw = 1) - EQolw = 0)] + Tar. 

It follows that the bias term for estimating Tan is given by the first term. 

b. If E(volw = 1) < E(vol|w = 0), those who participate in the program would have had 
lower average earnings without training than those who chose not to participate. This is a form 
of self-selection, and, on average, leads to an underestimate of the impact of the program. 


21.2. Let k = [w — p(x)]y/{p(x)[1 — p(x)]}. Then we know from equation (21.21) that 
E(A|x) = M1 (x) — Ho(X) = Tare(X). 
Define a dummy variable as d = 1[x € R]. Then, by iterated expectations and the fact that d is 


a function of x, 


E(Qv1 — yold) = E[EQ1 -yolx, d)ld] = E[EQ1 — yolx)Id] 
= Eltate(x)|d] = E[E(A|x)|d] = E(Ald) 


It follows that TaeR = Taer = E1 —yolx € R) = EG —yold = 1) = E(Ald = 1). Now use 
the simple relationship 

E(d -k) = P(d = 1)E(Ald = 1) 
and so 


4. Ed-k Eld-k) 
ea P(d=1) P(xeER)’ 


If we know the propensity score, a consistent estimator of E(d + k) would be 
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N 
N* $ 1[x; € R]ki, 


i=1 
and a consistent estimator of P(x € R) is just the fraction of observations with x; € R, call 
this Nz/N. Combining these two estimators and using the expression for k; gives 

N 

aer = NR > Iki € RIki, 

i=1 

which is simply the average of k; over the subset of observations with x; € R. 
21.3. a. The simple regression estimate is 7a. =. 128, which means that those participating 

in the job training program are about . 128 more likely of being unemployed after completing 
the progam. Further, its heteroskedasticity-robust ¢ statistic is about four. This appears to be a 


case of self-selection into training: those who would have a higher chance of being 


unemployed are also more likely to participate in job training. 


reg unem78 train, robust 


Linear regression Number of obs = 2675 
F( 1, 2673) = 15.90 
Prob > F = 0.0001 
R-squared = 0.0098 
Root MSE = 32779 
| Robust 
unem78 | Coef Std. Err t P>|t | [95% Conf. Interval 
jet i us tak ei +--------------------------------------------------------------- 
train | .1283838 .0321964 3.99 0.000 .0652514 .1915162 
cons | . 1148594 . 0063922 17.97 0.000 . 1023252 .1273936 


b. Adding the controls listed in the problem changes the picture considerable. The estimate 
Of Tate is now —. 199, so participating in the job training program is estimated to reduce the 
unemployment probability by about .20. The 95% confidence interval for T ae is 


[—. 288, —.111], which clearly excludes zero. 
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reg unem78 train age educ black hisp married re74 re75 unem75 unem74, 


Linear regression 


Number of obs 
F( 10, 2664) 
Prob > F 
R-squared 
Root MSE 


robust 


= 2675 
= 64.36 


hisp 
married 
re74 
re75 
unem75 
unem74 
_cons 


Robust 


Std. Err. 


- .1993525 
.0028579 
.0002969 

- .0179975 

- .0625543 

- .0136721 
.0008451 


.045185 
. 0006397 
.0020983 
.0122695 
.0250947 
.0173229 

.001004 


- .2879538 

. 0016036 
- .0038176 
- .0420563 
- .1117613 
- .0476399 
- .0011236 


- .1107512 
0041123 
.0044114 
. 0060613 

- .0133474 
.0202957 
0028138 


c. After running the regressions for the untrained and trained groups separately, we obtain a 


fitted value (fitted probability) in each state for all 2,675 men in the sample. For each i we 


estimate the treatment effect conditional on x as 


Ta) = (@1 + xiB,) - (Go + x:B,). 
Then 
N 
Tate = N >> T(x;) 
i=1 
N 
Žan = NI! De train; * T(X;) 
We get Tare = —. 203, which is very close to the estimate when we assume B, = B,. The 


estimate of Tay is somewhat larger in magnitude: Tay = 


—. 270. 


reg unem78 age educ black hisp married re74 re75 unem75 unem74 if ~train 


Source 


Model 


9 11.1195386 


100.075847 
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F( 9, 


Number of obs 
2480) 
Prob > F 


= 2490 
180.15 
0.0000 


Residual 


153.074354 2480 


253.150201 2489 


hisp 
married 
re74 
re75 


.0021732 
- .0014064 
- .0173876 
- .0517355 
- .0149914 

.0014736 
- .0035097 

. 3435381 

. 3363692 

0500675 


, 06172353 

. 101707594 

Std. Err t 
. 0005579 3.90 
. 0019407 -0.72 
. 0125968 -1.38 
. 0285084 -1.81 
.0151672 -0.99 
.0007966 1.85 
.0007814 -4.49 
.0257242 13.35 
.0275345 12.22 
. 0349642 1.43 


. 0032673 
.0023992 
.0073136 
.0041672 
.0147503 
. 0030356 
.0019774 
. 3939813 
. 3903622 


R-squared = 

Adj R-squared = 

Root MSE 
P>|t | [95% Conf. Interval 
0.000 .0010791 
0.469 - .005212 
0.168 - .0420889 
0.070 - .1076382 
0.323 - .0447332 
0.064 - .0000884 
0.000 - .0050419 - 
0.000 . 293095 
0.000 . 2823763 
0.152 - .0184946 


. 1186296 


predict unem78_0 
(option xb assumed; fitted values) 


reg unem78 age educ black hisp married re74 


re75 unem75 unem74 if train 


. 0069192 
0228166 
. 3480401 
. 1855168 
.0926165 


.017377 


.0176553 
. 3836338 
.0018912 
.8487174 


Source | SS df MS Number of obs = 
se SOS ea een ace Se cia se ate S F( 9, 175) = 
Model | 2.71236085 9 .301373428 Prob > F = 
Residual | 31.3416932 175 . 17909539 R-squared = 
-------------+------------------------------ Adj R-squared = 
Total | 34.0540541 184 .185076381 Root MSE 
unem78 | Coef Std. Err t P>|t | [95% Conf. Interval 
jt a ag te pr a Aloe Fis at a +--------------------------------------------------------------- 
age | -.0022981 .0046702 -0.49 0.623 - .0115153 
educ | - .008484 .0158595 -0.53 0.593 - .0397845 
black | . 1374346 .1067107 1.29 0.199 -.073171 
hisp | -.1412636 .1655747 -0.85 0.395 - . 468044 
married | -.0761776 .0855254 -0.89 0.374 - .2449717 
re74 | -.0019756 . 0098056 -0.20 0.841 - .0213281 
re75 | - .010362 .014196 -0.73 0.466 - .0383794 
unem75 | . 1822138 . 1020566 1.79 0.076 - .0192063 
unem74 | - .233911 .1194775 -1.96 0.052 - .4697132 
_cons | . 3735869 - 2407415 1.55 0.123 -.1015435 
predict unem78_1 
(option xb assumed; fitted values) 
gen te = unem78_1 - unem78_0 
sum te 
Variable | Obs Mean Std. Dev Min Max 
je ee sc ee pe eH +-------------------------------------------------------- 
te | 2675 - .2031515 . 2448774 -1.5703 . 3241221 


sum te if train 
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Variable | 


- . 2698234 .309953 -.7017545 . 3241221 


d. Using the subsample of men who were unemployed in 1974, 1975, or both gives 
Tate = —. 625 and Taz = —. 194. The estimate of Tate is much larger in magnitude than on the 


full sample and ĉan is somewhat smaller. 


keep if unem74 | unem75 
(2240 observations deleted) 


reg unem78 age educ black hisp married re74 re75 if ~train 


Source | SS df MS Number of obs = 302 
Eee E eee Sm er ene F( 7, 294) = 12.93 
Model | 17.2414134 7 2.46305906 Prob > F = 0.0000 
Residual | 56.020176 294 .190544816 R-squared = 0.2353 
-------------+------------------------------ Adj R-squared = 0.2171 
Total | 73.2615894 301 .243393985 Root MSE 43651 

unem78 | Coef Std. Err t P>|t | [95% Conf. Interval 

ei ee a Sa ep a a a +--------------------------------------------------------------- 
age | .0121428 .0025998 4.67 0.000 .0070262 .0172594 
educ | -.0000954 .0090746 -0.01 0.992 - .0179548 -017764 
black | -.0713435 .0713164 -1.00 0.318 -.2116989 .0690119 
hisp | -.1965901 .1220144 -1.61 0.108 - .4367224 . 0435422 
married | .0610631 .075997 0.80 0.422 - .088504 . 2106302 
re74 | -.0094196 . 0029031 -3.24 0.001 - .0151331 - .0037061 
re75 | -.0190763 . 0029208 -6.53 0.000 - .0248247 - .013328 
cons | . 1819096 . 1810088 1.00 0.316 - .1743278 . 5381469 

predict unem78_0 
(option xb assumed; fitted values) 
reg unem78 age educ black hisp married re74 re75 if train 

Source | SS df MS Number of obs = 133 

Se Š onee e E n ee ete a Ga ig Ee ay Seca F( 7, 125)= 1.90 
Model | 2.33329022 7 .333327175 Prob > F = 0.0754 
Residual | 21.9674617 125 .175739693 R-squared = 0.0960 
-------------+------------------------------ Adj R-squared = 0.0454 
Total | 24.3007519 132 184096605 Root MSE 41921 

unem78 | Coef Std. Err t P>|t | [95% Conf. Interval 

ee ln ee ie ee et, a Sey +--------------------------------------------------------------- 
age | -.0058054 .0049952 -1.16 0.247 - .0156914 . 0040807 
educ | -.0267626 .0175847 -1.52 0.131 - .0615649 . 0080397 
black | .1754782 . 1201604 1.46 0.147 - .0623342 . 4132906 
hisp | -.1106474 . 2078183 -0.53 0.595 - .5219455 . 3006508 
married | -.1606594 . 1015391 -1.58 0.116 - .3616179 .040299 
re74 | -.0150277 .066169 -0.23 0.821 - .1459844 .1159289 
re75 | -.0269891 .0282243 -0.96 0.341 - .0828484 .0288702 


. 5632464 . 253897 


predict unem78_1 
(option xb assumed; fitted values) 


gen te = unem78_1 - unem78_0 


sum te 
Variable | Obs Mean 
te | 435 - .625014 
sum te if train 
Variable | Obs Mean 
te | 133 - .1935882 


2.22 
Std. Dev 
. 3867973 
Std. Dev 


. 2039181 


028 . 0607527 1.06574 
Min Max 
-1.62662 . 1450891 
Min Max 
- . 7526801 . 1450891 


e. We use the entire set of data for this part. The logit model for trainis estimated below. 


Of the 2,675 observations, 78 failures are completely determined. This means that the overlap 


assumption fails because for some values of x the probability of being in the training group is 


zero. If we are interested in the ATE then our only recourse is to redefine the population so 


that each unit has a nonzero chance of being in the treated group (and a nonzero chance of 


being in the control group, which is not a problem in this example). 


logit train age educ black hisp married re74 re75 unem74 unem75 


Logistic regression 


Log likelihood = -209.38931 


train | Coef Std. Err 
age | -.1109206 .0177106 
educ | -.1008807 .0561133 
black | 2.650097 . 3605668 
hisp | 2.247747 . 5908963 
married | -1.560628 . 2817913 
re74 | .0201797 .0313149 
re75 | -.2743162 .0477066 
unem74 | 3.272456 -4887585 
unem75 | -1.371405 . 4545789 
_cons | 1.794543 .979261 


Number of obs = 2675 
LR chi2(9) = 926.52 
Prob > chi2 = 0.0000 
Pseudo R2 0.6887 
P>|z| [95% Conf. Interval 
0.000 - .1456327 - .0762084 
0.072 - .2108608 . 0090994 
0.000 1.943399 3.356795 
0.000 1.089611 3.405882 
0.000 -2.112928 -1.008327 
0.519 - .0411963 .0815557 
0.000 - .3678194 - .1808129 
0.000 2.314507 4.230405 
0.003 -2.262363 - .4804465 
0.067 - .1247735 3.713859 
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78 failures and 0 successes completely determined. 


f. The State session is below. The IPW estimate is 7 grepsw = —. 132. The standard error that 


adjusts for the first-step estimation is about .0504. If we do not take advantage of the smaller 


asymptotic variance due to estimating the propensity score, the standard error is .0580, which 


is about 15% larger. The estimate of Tayn is similar, about —. 124. 


If we assume a constant treatment effect in using regression adjustment, 


Tatereg = Tatreg = —. 235 and its standard error is .0509. Interestingly, this is very close to the 


standard error for Tare,psw, but the estimate is much larger in magnitude, leading to a large t 


statistic. Unfortunately, it appears separate regression are warranted, and this changes Tgieeg tO 


—. 119 (although ĉarreg = —.294). The standard error for T aiereg that does not even account for 


the randomness in the sample averages is quite large, .0911, and so Tae,eg is barely statistically 


different from zero at the 10% level if we use a one-sided alternative. The IPW estimator 


appears to be more efficient for this application. (It could have something to do with using 


linear regression adjustment rather than, say, probit or logit.) The joint test of the interaction 


terms shows separate regressions are warranted. 


keep if avgre <= 15 
(1513 observations deleted) 


logit train age educ black hisp married re74 re75 unem74 unem75 


Logistic regression 


Log likelihood = -180.28028 


Number of obs 


train | Coef Std. Err 
age | -.1155512 .0187215 
educ | -.1049275 .0591078 
black | 2.608068 .37 72016 
hisp | 2.395905 .6292337 
married | -1.631159 . 3038189 
re74 | -.0290672 .04281 
re75 | -.3794923 .0682029 
unem74 | 3.009282 .5221746 
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LR chi2(9) 

Prob > chi2 = 

Pseudo R2 
P>|z| [95% Conf. 
0.000 - .1522447 
0.076 - .2207766 
0.000 1.868767 
0.000 1.162629 
0.000 -2.226633 
0.497 - .1129732 
0.000 - .5131676 
0.000 1.985839 


- .0788577 
.0109217 
3.34737 
3.62918 

-1.035685 
. 0548387 
- .245817 
4.032726 


unem75 | -1.751808 - 4995608 -3.51 0.000 -2.730929 -. 7126867 
_cons | 2.695208 1.053604 2.56 0.011 . 6301819 4.760234 
predict phat 
(option pr assumed; Pr(train) ) 
tab train 
=1 if in | 
job | 
training | Freq Percent Cum 
O | 982 84.51 84.51 
1 | 180 15.49 100.00 
Total | 1,162 100.00 
sum train 
Variable | Obs Mean Std. Dev Min Max 
ar, ae a a a i ay eS +-------------------------------------------------------- 
train | 1162 . 1549053 . 3619702 0 1 
gen rhohat = r(mean) 
gen kate = ((train - phat)*unem78)/(phat*(1 - phat)) 
gen katt = ((train - phat)*unem78)/(rhohat*(1 - phat)) 
sum kate katt 
Variable | Obs Mean Std. Dev. Min Max 
Sn oat a iam pm fa ee! ae +-------------------------------------------------------- 
kate | 1162 - .1319506 1.977683 -16.62496 56.51032 
katt | 1162 - .1243131 4.922131 -100.8678 6.455555 


* Get the correct standard error for the ATE estimate. 


gen uh = train - phat 


gen 
gen 
gen 
gen 
gen 
gen 
gen 
gen 


gen 


ageuh = 
educuh 
blackuh 
hispuh 
marrie 
re74uh 
re75uh = 
unem74uh 


unem75uh 


age*uh 

= educ*uh 

= black*uh 
= hisp*uh 
duh = married*uh 


= re/74*uh 


re75*uh 


= unem74*uh 


= unem75*uh 
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reg kate uh ageuh educuh 


blackuh hispuh marrieduh re74uh re75uh unem74uh 


Number of obs 
1151) 


F( 10, 
Prob > F 
R-squared 


Adj R-squared 


Root MSE 


= 1162 
= 38.51 
= 0.0000 


unem75uh 
Source | Ss 
Model | 1138.33705 
Residual | 3402.59957 
Total | 4540.93661 
kate | Coef 
Se oat ee aa fm, lee Fo Sm T + 
uh | 3.525428 
ageuh | .0016821 
educuh | . 2945194 
blackuh | -3.176048 
hispuh | -5.475508 
marrieduh | 4.005544 
re74uh | . 3468368 
re75uh | -.8364872 
unem74uh | -2.607257 
unem75uh | .2278527 
_cons | -.1319506 


- .3473914 
- .0671816 

. 0607052 
-4.473198 
-7.462378 

2.931163 

. 1978287 
-1.044504 
-4.212386 
-1.335114 
- .2309129 


7 .398247 
.0705458 
. 5283336 
-1.878898 
-3.488638 
5.079926 
. 495845 

- .62847 
-1.002129 
1.790819 
- .0329883 


di e(rmse)/sqrt(e(N)) 


.05043879 


di -.1320/.0504 


-2.6190476 
reg kate 
Source 


Model 


0 


Number of obs = 
1161) 


F( 0, 
Prob > F 
R-squared 


Adj R-squared 


Root MSE 


ll 
© 
© 
© 


reg unem78 train, 


robust 


Linear regression 


Number of obs = 
1160) 


F( 4, 

Prob > F 
R-squared 
Root MSE 


df MS 

10 113.833705 
1151 2.95621161 
1161 3.91122878 
Std. Err t 
1.973887 1.79 
. 0350983 0.05 
, 1191697 2.47 
,6611273 -4.80 
1.012662 -5.41 
.5475872 7.31 

.075946 4.57 
. 1060216 -7.89 

.818097 -3.19 

. 796608 0.29 
.0504388 -2.62 

df MS 

(0) i 

1161 3.91122878 
1161 3.91122878 
Std. Err t 
.0580168 -2.27 
Robust 
Std. Err. t 


train | .0147658 . 0350282 0.42 0.673 - .0539599 .0834915 
cons | . 2352342 .0135467 17.36 0.000 . 2086555 . 2618129 

reg unem78 train age educ black hisp married re74 re75 unem74 unem75, robust 
Linear regression Number of obs = 1162 
F( 10, 1151) = 61.27 

Prob > F = 0.0000 

R-squared = 0.3312 

Root MSE 34968 

| Robust 

unem78 | Coef Std. Err. t P>|t | [95% Conf. Interval 

sl es i ad +--------------------------------------------------------------- 
train | -.2349689 .0509218 -4.61 0.000 - . 3348787 - . 135059 
age | .0059358 .0012367 4.80 0.000 .0035094 . 0083622 
educ | - 0022623 . 0042076 0.54 0.591 - .005993 .0105177 
black | -.0202408 .022745 -0.89 0.374 - .0648671 .0243855 
hisp | - .100478 .0399462 -2.52 0.012 -.1788536 - .0221024 
married | -.0352163 .0272463 -1.29 0.196 - .0886743 .0182417 
re74 | -.0010355 .002876 -0.36 0.719 - .0066783 . 0046073 
re75 | -.0177354 .0024155 -7.34 0.000 - .0224746 - .0129961 
unem74 | . 2220472 .051956 4.27 0.000 . 1201081 . 3239863 
unem75 | . 1439644 .048573 2.96 0.003 . 0486629 . 2392658 
_cons | . 1103197 .0759773 1.45 0.147 - .038 7499 . 2593893 


reg unem78 age educ black hisp married re74 re75 unem74 unem75 if ~train 


Source | SS df MS Number of obs = 982 

oe Mens eee ee eee ate E RSE EE aE F( 9, 972) = 86.39 
Model | 78.510332 9 8.72337022 Prob > F = 0.0000 
Residual | 98.1505642 972 .100977947 R-squared = 0.4444 
-------------+------------------------------ Adj R-squared = 0.4393 
Total | 176.660896 981 .180082463 Root MSE 31777 

unem78 | Coef Std. Err. t P>|t | [95% Conf. Interval 

si) ad ae te pa a a a a +--------------------------------------------------------------- 
age | .0050777 .0011449 4.43 0.000 . 0028309 .0073245 
educ | -.0002579 .0039421 -0.07 0.948 - .0079938 .0074781 
black | -.0146538 .0238818 -0.61 0.540 - .0615196 .0322119 
hisp | -.0862098 . 0524883 -1.64 0.101 - .1892132 . 0167936 
married | -.0424904 .0262258 -1.62 0.106 - .093956 .0089752 
re74 | . 0022784 . 0024006 0.95 0.343 - .0024325 . 0069892 
re75 | -.0143134 .0025479 -5.62 0.000 - .0193134 - .0093133 
unem74 | . 3521536 .0435278 8.09 0.000 . 2667344 .4375729 
unem75 | . 1965244 . 0423339 4.64 0.000 . 1134481 .2196007 
_cons | .0770668 .0757616 1.02 0.309 - .0716084 .2257419 


predict unem78_0 
option xb assumed; fitted values 
p 


reg unem78 age educ black hisp married re74 re75 unem74 unem75 if train 


Source 


SS 


df 


MS 
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Number of obs = 


180 


Se Ree ae ee eee eee eas eae F( 9, 170) = 1.57 
Model | 2.58861704 9 .287624115 Prob > F = 0.1281 
Residual | 31.161383 170 = .183302253 R-squared = 0.0767 
-------------+------------------------------ Adj R-squared = 0.0278 
Total | 33.75 179 .188547486 Root MSE = .42814 

unem78 | Coef Std. Err t P>|t | [95% Conf. Interval 

fee) ad eS ws ee a e a a +--------------------------------------------------------------- 
age | - .002544 .0047571 -0.53 0.593 - .0119347 . 0068466 
educ | -.0086994 .0162153 -0.54 0.592 - .0407086 0233098 
black | . 1402344 . 108018 1.30 0.196 - .072995 . 3534638 
hisp | -.1480334 . 1683835 -0.88 0.381 - . 4804252 . 1843585 
married | -.0713415 .0879005 -0.81 0.418 - .2448585 .1021756 
re74 | -0073134 .0145599 0.50 0.616 - .021428 . 0360549 
re75 | -.0064075 0214837 -0.30 0.766 - .0488166 . 0360016 
unem74 | -.1885821 . 1321929 -1.43 0.156 - . 4495331 .0723688 
unem75 | . 1935779 .1115475 1.74 0.084 - .0266186 . 4137745 
_cons | » 3229791 . 2550769 1.27 0.207 - .1805469 -8265051 


predict unem78_1 
option xb assumed; fitted values 
pti b d; fitted val 


gen te = unem78_1 - unem78_0 
sum te 
Variable | Obs Mean Std. Dev. Min Max 
te | 1162 - .1193285 .3326819 -.9173806 . 3599507 
sum te if train 
Variable | Obs Mean Std. Dev. Min Max 
te | 180 - .2941826 . 2835388 - . 128443 . 2494144 


egen mage = mean(age) 

gen trainage = train*(age - mage) 

egen meduc = mean(educ) 

gen traineduc = train*(educ - meduc) 

egen mblack = mean(black) 

gen trainblack = train*(black - mblack) 
egen mhisp = mean(hisp) 

gen trainhisp = train*(hisp - mhisp) 

egen mmarried = mean(married) 

gen trainmarried = train*(married - mmarried) 
egen mre74 = mean(re74) 

gen trainre74 = train*(re74 - mre74) 

egen mre75 = mean(re75) 

gen trainre75 = train*(re75 - mre75) 

egen munem74 = mean(unem74) 

gen trainunem74 = train*(unem74 - munem74) 
egen munem75 = mean(unem75) 

gen trainunem75 = train*(unem75 - munem75) 


reg unem78 train age educ black hisp married re74 re75 unem74 unem75 
trainage traineduc trainblack trainhisp trainmarried trainre74 
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trainre75 trainunem74 trainunem75, robust 


Linear regression 


Number of obs 
F( 19, 1142) 
Prob > F 
R-squared 
Root MSE 


- .2980497 
0026812 
- .008466 

- .0600269 
- .169584 

- .0968847 
- .003389 

- .0193776 
2410286 
. 0846013 

- .0174702 

- .0384192 

- .0161256 

- . 2532742 

- .1902126 
- .027544 


. 0593928 
.0074742 
.0079503 
.0307192 
. 0028356 
.0119039 
.0079457 
.0092491 
. 4632787 
. 3084475 
.0022267 
.0215361 
. 3259022 
.1296271 
. 1325104 
.0376142 


| Robust 
unem78 | Coef. Std. Err. t 

E Sp da eh “te oe +--------------------------------------------------------------- 

train | -.1193284 .0910893 -1.31 

age | .0050777 .0012214 4.16 

educ | -.0002579 . 0041835 -0.06 

black | -.0146538 .0231254 -0.63 

hisp | -.0862098 .0424936 -2.03 

married | -.0424904 .0277233 -1.53 

re74 | .0022784 .0028885 0.79 

re75 | -.0143134 .0025811 -5.55 

unem74 | . 3521536 . 0566374 6.22 

unem75 | . 1965244 .0570442 3.45 

trainage | -.0076217 .0050195 -1.52 

traineduc | -.0084415 .0152788 -0.55 

trainblack | . 1548883 .0871611 1.78 

trainhisp | -.0618236 .0975772 -0.63 

trainmarried | -.0288511 0822415 -0.35 

trainre74 | .0050351 .0166047 0.30 

trainre75 | 0079059 0185161 0.43 

trainunem74 | -.5407358 1357835 -3.98 

trainunem75 | -.0029465 0975097 -0.03 

_cons | 0770668 0760186 1.01 


test trainage traineduc trainblack trainhisp trainmarried trainre74 


trainre75 trainunem74 trainunem75 


1) trainage = 0 

2) traineduc = 0 

3) trainblack = 0 
4) trainhisp = 0 

5) trainmarried = 0 


B2QBQRODQDONOOD em 


6) trainre74 = 0 
7) ‘trainre75 = 0 
8) trainunem74 = 0 
9) trainunem75 = 0 
F( 9, 1142) = 8.61 
Prob > F = 0.0000 


21.4. The integral is equivalent to 


f ag(a)da. 
—(89+x01+z0>) 
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Because d¢(a)/da = —ag(a), the antiderivative of aġ(a) is simply —¢(a). Now 


ag(a)da 


—$ (4) }"65+x01+28>) = —p(m) + o[-(80 + x01 + z02)] 


(m) H (Ao + xO 4 Z02) 


ER 


where we use the symmetry of ọ(-+). As m > œ, @(m) > 0. Therefore, 


f ap(a)jda = $(0o + x01 +202). 
—(00+x01+z02) 


21.5. The Stata output to answer all parts follows. 

a. The first two Stata commands are used to obtain the probit fitted values, called PHThat. 

b. The IV estimate of t is —43. 27 and its standard error is huge, 585.78. Clearly we can 
learn nothing of value from this estimate. 

c. The collinearity suspected in part b is confirmed by regressing Ô, on the x;: the 
R-squared is .9989, which means there is almost no separate variation in Ô, that cannot be 
explained by x;. 

d. This example illustrates why trying to achieve identification off of a nonlinearity can be 
fraught with problems. In cases with larger sample sizes the estimates may seem more 
reasonable, but we are only able to compute estimates at all because of the presumed 
functional form for P(w|x). A good general rule is that if a linear IV approach does not identify 


tT then we should not hope to learn anything useful by introducing nonlinearity in P(w]x). 


. probit train age educ black hisp married re74 re75 


Probit regression Number of obs = 445 
LR chi2(7) = 8.60 

Prob > chi2 = 0.2829 

Log likelihood = -297.80166 Pseudo R2 = 0.0142 
train | Coef. Std. Err Z P>|z | [95% Conf. Interval 

ei mee as cs ae Sp n= eh el +--------------------------------------------------------------- 
age | .0066826 .0087391 0.76 0.444 - .0104458 .0238109 
educ | -0387341 .0341574 1.13 0.257 - .0282132 . 1056815 
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black 
hisp 
married 
re74 


| -.2216642 
| -.5753033 
|  .0900855 
| -.0138226 
| 028755 
| -.5715372 


» 2242952 
. 3062908 
. 1703412 
.0155792 
. 0267469 
-475416 


. 323 
. 060 
597 
375 
282 
229 


- .6612747 
-1.175622 
-.2437771 
- .0443572 
- .0236679 
-1.503335 


.2179463 
.0250157 
. 4239482 
.016712 
.0811779 
. 3602609 


predict PHIhat 
(option pr assumed; Pr(train)) 


ivreg re78 age educ black hisp married re74 re75 (train = 


Instrumental 
Source 


Model 


variables (2SLS) regression 


8 -26648.4277 
436 533.745593 


444 43.9767041 


| 19525.6566 


| -213187 .422 
| 232713.078 


PHIhat) 


Number of obs 
F( 8, 436) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


hisp 


re74 


| -43.26513 
| .1717735 
| 1.067645 
| -6.114187 
| -9.523185 
| 1.432202 
| -.1443703 
| 5327602 
| 13.30468 


585.7793 
1.520127 
8.646843 
51.56931 
126.287 
20.72909 
2.973787 
6.2896 
165.517 


-1194.567 
-2.815914 
-15.92703 
-107 . 4695 
-257 . 7302 
-39.30917 
-5.98911 
-11.82894 
-312.0058 


1108.037 
3.159461 
18.06232 
95.24116 
238.6838 
42.17357 

5.70037 
12.89447 
338.6151 


Instrumented: 
Instruments: 


train 


reg PHIhat 


Source 


Model 


| 2.04859095 
| .002314965 


| 2.05090592 


Number of obs 
F( 7, 487) 
Prob > F 
R-squared 

Adj R-squared 
Root MSE 


445 


=55245.15 


0.0000 


hisp 
married 
re74 
re75 
_cons 


| . 0025883 
| .0146708 
| -.0875955 
| -.2156441 
| . 0351309 
| -.0051274 
| .0108521 
| . 2823687 


df MS 
7 .292655851 
437 5.2974e-06 
444 .004619157 
Std. Err. t 
.0000158 163.53 
. 000062 236.45 
.0004094 -213.96 
.0005445 -396.02 
.0003107 113.08 
.0000271 -189.22 
.0000474 228.89 
.0008635 326.99 


OO0O0O0O0O0O0O 


.0025572 
.0145488 
- .0884002 
- .2167143 
. 0345203 
- .0051807 
.0107589 
.2806715 


.0026194 
.0147927 
.0867909 
. 2145739 
. 0357415 
.0050742 
.0109453 
. 2840659 


21.6. As in Procedure 21.1, the IV estimator is consistent whether or not G(x, z; y) is 
correctly specified for P(w = 1|x,z). The OLS estimator from y; on 1, Gj, X; Gi - (xi —X), 
i = 1,...,N generally requires the model for P(w = 1|x,z) to be correctly specified. This can 


be seen by writing 
EQ)|x,z) = y + TE(w|x, z) + xBy + E(w]x, z) » (x — y)8, 


which is the estimating equation underlying the OLS regression on probit fitted values and the 
interactions. If E(w|x,z) = P(w = 1|x,z) + G(x, z; y) for all y then plugging in G; generally 
produces inconsistent estimators. 

Even if G(x, z; y) is correctly specified, the standard errors for the two-step OLS estimator 
are harder to obtain. One one must use the material on generated regressors in Chapter 6 or 
apply the bootstrap. 

21.7. a. There are several options. To estimate the mean parameters, we can use Poisson 
regression (especially in the case where w is a count variable) or gamma regression (if w is 
nonnegative and continuous). Of course, we can use NLS, too (which is also a QMLE in the 
LEF). 

As stated in the hint, if we define r = w — E(x) then 
E(r?|x) = Var(w|x) = exp(6o + x6). Therefore, if we observed, r*, we could use it as the 
dependent variable in, say, a gamma or negative binomial QMLE. In practice, we use 


Fi = wi -exp(Vo+ Xi¥,), the residuals from estimating the mean parameters. 


b. By the law of large numbers, 
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By the usual argument, we can replace y(x;) and @(x;) with consistent estimators; more 


precisely, in a parametric context replace the unknown parameters with consistent estimators. 


In the case of exponential mean and variance functions, 


X ( [wi —exp(Jo + xf vi 
B = N-1 > i E yo Mi Yi 
exp(ôo F X;ð1) 


i=1 


_— nl paly 
m ae aa) 


We can use Problem 12.17 to get a standard error for B or use the bootstrap. 


ae 


c. I use Poisson regression to estimate the mean parameters and then gamm regression to 


estimate the variance parameters. The resulting estimate of f is about . 102; the standard error 


is not reported. If we ignore estimation of the parameters in E(w|x) and Var(w|x) then the 


standard error is about .050. 


There is not much reason to compute a standard error for p because the standard regression 


adjustment estimate, B, is very close, and provides a valid standard error. Namely, running the 


regression 


re78; on 1, mostrni, agei, educi, blacki, hispi, married;, re74i, re75 


gives B =.103 (se =.038). With random assignment to the job training program it is perhaps 


not too surprising to see the methods give similar estimates. In fact, the simple regression 


estimate is .112 (se =.038). 


glm mostrn age educ black hisp married re74 re75, fam(poiss) link(1log) 


robust 


Generalized linear models 


Optimization : ML 

Deviance = 6136.777616 
Pearson = 5296.290311 
Variance function: V(u) = u 
Link function : g(u) = Ln(u) 


Log pseudolikelihood = -3504.968642 
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No. of obs 
Residual df 
Scale parameter 
(1/df) Deviance 
(1/df) Pearson 


[Poisson] 
[Log] 


AIC 
BIC 


445 
437 


14.04297 
12.11966 


15.78862 
3471.919 


hisp 


re74 


Robust 


Std. Err. 


. 0037486 
. 0448349 
- . 1809126 
- .4907343 
. 0824876 
- .0048997 
. 0388417 
1.605453 


.0081575 
. 0344975 
.1906097 
. 3198765 
. 1620227 
.0140984 
.0197161 
. 4551975 


- .0122398 
- .0227789 
- .5545006 
-1.117681 
- .2350709 
- .0325322 

. 0001989 

. 7132821 


.0197369 
1124487 
1926755 
1362121 
. 4000462 
0227327 
.07 74846 
2.497624 


predict mostrnh 
(option mu assumed; predicted mean mostrn) 


gen rh = 


mostrn 


gen rhsq = rh^2 


- mostrnh 


glm rhsq age educ black hisp married re74 re75, fam(gamma) link(log) robust 


Generalized linear models 


Optimization 


Deviance 
Pearson 


ML 


= 251.4433046 
= 301.0457257 


Variance function: V(u) = 


Link function 


g(u) = 


u^2 
ln(u) 


Log pseudolikelihood = -2442.743803 


No. 


of obs 


Residual df = 


Scale parameter 


(1/df) Deviance = 


(1/df) Pearson 


.5753851 
. 6888918 


11.01458 


= -2413.415 


hisp 


re74 


Robust 


- .0020813 

.0206816 
- .0424931 
-.2107907 


.0054739 
.0254745 
. 1083251 
.2173269 


.0086475 
.0706107 
. 1698202 
.2151623 


predict omegah 


(option mu assumed; predicted mean rhsq) 


sum omegah 


Variable 
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[Gamma] 
[Log] 
AIC = 
BIC 
P>|z| [95% Conf. Interval 
0.704 - .01281 
0.417 - .0292474 
0.695 - .2548063 
0.332 - .6367437 
0.676 - . 1445427 
0.379 - .0116116 
0.007 0139084 
0.000 3.660668 
Min Max 
60.9556 369.0591 


gen kh = ( mostrn - mostrnh)*re78/omegah 
sum kh 
Variable | Obs Mean Std. Dev Min Max 
ee eee ee ee ae +-------------------------------------------------------- 
kh | 445 . 1024405 1.047671 -3.030556 10.28323 
reg kh 
Source | SS df MS Number of obs = 445 
sie siae eee ait E aa E a eaten e Ea ae F( 0, 444) = 0.00 
Model | 0 0 . Prob > F = 
Residual | 487 . 34086 444 1.09761455 R-squared = 0.0000 
-------------+------------------------------ Adj R-squared = 0.0000 
Total | 487 . 34086 444 1.09761455 Root MSE 1.0477 
kh | Coef Std. Err. t P>|t | [95% Conf. Interval 
Ss awe tae ae ag pe, ih i an | ar TA +--------------------------------------------------------------- 
cons | . 1024405 .0496644 2.06 0.040 . 0048341 . 200047 
reg re78 mostrn age educ black hisp married re74 re75, robust 
Linear regression Number of obs = 445 
F( 8, 486) = 3.09 
Prob > F = 0.0021 
R-squared = 0.0613 
Root MSE 6.4838 
| Robust 
re78 | Coef Std. Err t P>|t | [95% Conf. Interval 
Sa th aa “a +--------------------------------------------------------------- 
mostrn | .102825 .0380686 2.70 0.007 .0280043 .17 76458 
age | -0570883 .0399249 1.43 0.153 - .0213808 .1355575 
educ | . 3980183 . 1548109 2.57 0.010 .09375 . 7022867 
black | -2.150926 1.007271 -2.14 0.033 -4.130637 -.1712163 
hisp | .1712523 1.365153 0.13 0.900 -2.511846 2.85435 
married | - .154993 .8733899 -0.18 0.859 -1.871571 1.561585 
re74 | .0788359 . 1071444 0.74 0.462 - .1317478 . 2894197 
re75 | -0305561 . 1266573 0.24 0.809 - .2183787 . 2794909 
cons | . 6004532 2.366495 0.25 0.800 -4.050703 5.25161 
reg re78 mostrn, robust 
Linear regression Number of obs = 445 
F( 1, 443) = 8.66 
Prob > F = 0.0034 
R-squared = 0.0269 
Root MSE 6.5491 
| Robust 
re78 | Coef Std. Err. t P>|t | [95% Conf. Interval 
Ses i i, ee Sh a a ai +--------------------------------------------------------------- 
mostrn | .1126397 .0382802 2.94 0.003 .0374063 .1878731 
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_cons | 4.434831 . 3358041 13.21 0.000 3.774864 5.094798 


d. Because E(w|x) follows a logistic regression model we can use fractional logit (that is, 
maximize the Bernoulli QMLE). After we have estimated the mean parameters yo and y}, we 
form the fitted values and residuals 
Wi = Ao + xi¥,) 

ri = Wi- Wi 
and then estimate ôo, 61, and 62 from the OLS regression 
7 on 1, Wi, W? 


to get the variance estimates 


Because the @; are fitted values from a linear regression, nothing guarantees ô; > 0 for all i, 
something we need for the method in part b to make sense. To avoid this problem, we might 
use Var(w|x) = exp{69 + 6,E(w|x) + 62[E(w|x)]?} instead, and use the gamma QMLE with 
the squared residuals as the dependent variable. 

e. The Stata code carries out the procedure from part d, except that, because 13 estimated 
variances were not positive, the exponential variance function was used instead, with a gamma 
QMLE. below produces the estimate B =.689. The regression coefficient is not too different: 


~ 


B =.644 (se =. 235) 


use attend 


gen ACTsq = ACT^2 
gen ACTcu = ACT^3 


gen priGPAsq = priGPA^2 


gen priGPAcu = priGPA^3 


sum atndrte 
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Variable 


atndrte 


replace atndrte = 


| Obs Mean 
| 680 81.70956 
atndrte/100 


(680 real changes made) 


glm atndrte 


Generalized linear 


Optimization 


Deviance 
Pearson 


priGPA priGPAsq priGPAcu ACT ACTsq ACTcu frosh soph, 
link(logit) robust 
note: atndrte has noninteger values 


models 
ML 


87 .0709545 


85 .07495268 


Variance function: V(u) = 


Link function 


g(u) 


u*(1-u/1) 
In(u/(1-u)) 


Log pseudolikelihood = -223.2763498 


No. 


Residual df 


of obs 


Scale parameter = 


(1/df) Deviance = 
(1/df) Pearson 


17 .04699 


fam(bin) 


680 
671 


.129763 
1267883 


. 6831657 


= -4289.253 


priGPA 
priGPAsq 
priGPAcu 
ACT 


| -3.371154 
| 1.886443 
| -.2454989 
| . 5538998 
| -.0280986 
| . 0003858 
| . 3939498 
| .0941678 
| -.7731446 


Robust 
Std. Err. 


2.195517 
.8972586 
. 118004 
.6744028 
. 0304868 
. 0004505 
1155299 
. 1006569 
5.13392 


. 9319806 
3.645038 
- .0142153 
1.875705 
.0316544 
. 0012687 
. 6203841 
2914517 
9.289154 


predict atndrteh 


(option mu assumed; predicted mean atndrte) 


gen rh = atndrte - atndrteh 
gen rhsq = rh^2 
gen atndrtehsq = atndrteh’2 
reg rhsq atndrteh atndrtehsq 
Source | SS df MS 
ASS Model | .098172929 2 .049086465 
Residual | .894850267 677 .001321788 
— Total | .993023196 679 .001462479 
rhsq | Coef Std. Err. t 
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[Binomial] 
[Logit] 
AIC = 
BIC 
P>|z| [95% Conf. 
0.125 -7.67429 
0.036 . 1278489 
0.037 - .4767825 
0.411 -. 7679054 
0.357 - .0878516 
0.392 - .000497 
0.001 .1675154 
0.350 - .1031161 
0.880 -10.83544 
Number of obs = 
F( 2, 677) = 
Prob > F 
R-squared 
Adj R-squared = 
Root MSE 
P>|t | [95% Conf. 


Interval 


atndrteh | . 1604177 . 1514883 1.06 0.290 - .1370257 .457861 
atndrtehsq | -.1854786 .0994129 -1.87 0.063 - . 3806733 .0097161 
_cons | .0137235 .0571515 0.24 0.810 - .0984919 . 1259389 


predict omegah 
(option xb assumed; fitted values) 


sum omegah 
Variable | Obs Mean Std. Dev. Min Max 
omegah | 680 .0192213 .0120243 -.0049489 .0483992 


count if omegah < 0 
13 


drop omegah 


glm rhsq atndrteh atndrtehsq, fam(gamma) link(log) 


Generalized linear models No. of obs = 680 
Optimization >: ML Residual df = 677 
Scale parameter = 3.781574 
Deviance = 1657 .02942 (1/df) Deviance = 2.447606 
Pearson = 2560.125871 (1/df) Pearson = 3.781574 
Variance function: V(u) = uA2 [Gamma ] 
Link function : g(u) = Ln(u) [Log] 
AIC = -6.298906 
Log likelihood = 2144.628097 BIC = -2758.427 
OIM 
rhsq | Coef Std. Err Z P>|z | [95% Conf. Interval 
Gi a eke a a “ig a a +--------------------------------------------------------------- 
atndrteh | 21.19375 8.816922 2.40 0.016 3.912901 38.4746 
atndrtehsq | -17.96346 5.764534 -3.12 0.002 -29.26174 -6.665185 
_cons | -9.308977 3.33455 -2.79 0.005 -15.84458 -2.717338 


predict omegah 
(option mu assumed; predicted mean rhsq) 


gen kh = rh*stndfnl/omegah 


sum kh 
Variable | Obs Mean Std. Dev. Min Max 
B kh | 680 + .6890428 8.318917 -47.34537 41.90279 
reg stndfnl atndrte priGPA priGPAsq priGPAcu ACT ACTsq ACTcu frosh soph, 
robust 
Linear regression Number of obs = 680 


F( 9, 670) 


31.01 
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Prob > F = 0.0000 
R-squared = 0.2356 
Root MSE = 8709 

| Robust 

stndfnl | Coef Std. Err. t P>|t | [95% Conf. Interval 

eer ae a ee Sh er +--------------------------------------------------------------- 
atndrte | .6444118 . 2345274 2.75 0.006 . 1839147 1.104909 
priGPA | 1.987666 2.651676 0.75 0.454 -3.21893 7.194262 
priGPAsq | -1.055604 1.01697 -1.04 0.300 -3.052436 . 9412284 
priGPAcu | . 1842414 . 1262839 1.46 0.145 - .0637183 4322011 
ACT | . 3059699 . 7236971 0.42 0.673 -1.115017 1.726957 
ACTsq | -.0141693 . 0319499 -0.44 0.658 - .0769033 . 0485648 
ACTcu | - 0002633 . 0004629 0.57 0.570 - .0006456 .0011722 
frosh | -.1138172 . 1035799 -1.10 0.272 -.3171976 . 0895631 
soph | -.1863224 .0870459 -2.14 0.033 - .3572381 - .0154067 
cons | -4.501184 5.629291 -0.80 0.424 -15.55436 6.55199 


21.8. a. From (21.129) and (21.130), we are assuming 


a= yo+xy+u 


E(u|x,z) = 0 
and so we can write 
y= Yo + Bw xy e 
= Yo + Pw + xy 


where E(r|x,z) = E(u|x,z) + E(e|x,z) = 0. Therefore, we need to add the usual rank condition: 


z must appear with nonzero coefficient vector in the linear projection of w on (1,x,z). More 


precisely, if 


L(w|1,x,z) = o + x€, + 26, 


then &, + 0. 
b. If 
w = max(0, To + XT1 + ZmT2 +v), 
D(v|x,z) ~ Normal(0, n°), 
then 
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E(w|x,z) = @(qn/n) - qn + n > (qn/n), 
where qr = To + X11 + ZT? [See equation (17.14)]. Because E(w}x, z) is a function of (x,z) and 


we have 


y = Yo +Pw+xy+r 
E(7|x,z) = 0, 


we can use ®(qr/n) - qu + n -+ ġ(qr/n) as a valid instrument for w. (Remember, any function 
of (x, z) is uncorrelated with r provided the second moments exist.) Because we do not know mt 
or n, we replace them with estimators. In other words, use ®(q,%/7}) - q,% + f - 6(q,7/f) as the 
IV for wi, where f and #) are the estimates from an initial Tobit MLE. 
c. By equation (14.57), the optimal IV for w is 
E(w\x, z)/Var(7]x, z). 
But E(ela, x,z) = 0 implies that e and a are uncorrelated, conditional on (x, z). Therefore, 


Var(u + e|x,z) = Var(u|x,z) + Var(e|x, z) 


= 0} +02 = o2. 
Therefore, Var(r|x, z) is constant, and E(w]x, z) can serve as the optimal IV for w. As usual, 
we replace the parameters in E(w|x, z) with yN -consistent estimators. The results in Chapter 6 
on generated instruments can be used to show that the resulting IV estimator has the same 
JN -asymptotic distribution as if we know r and 7. 


d. An alternative method would be to run the OLS regression 
y, onl, Wi, Xii = 1,...,N 
where 


Wi = Dq; $À) è qt +f plq) 
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are the estimated conditional means. While this “plug-in” approach may produce estimates 
similar to the IV approach, it is less preferred for the same reasons we covered for the probit 
case. First, using the w; as regressors rather than instruments is less robust: using them as 
regressors essentially requires the Tobit model to be correctly specified for w given (x, z). 
Second, valid standard errors are harder to get using w; as a regressor as opposed to an IV. 
Third, the plug-in procedure does not appear to be optimal within an interesting class of 
estimators. (By contrast, we know that the IV estimator is optimal in the class of IV estimators 
under the assumptions given for part c.) 

e. Estimate y; = no + x;y + Pwi + wi « (xi — X)6 + error; by IV, using instruments, 

[1, xi, Wi, Wi * (x; — X)] as instruments, where w; are the Tobit fitted values. This would be 
generally inefficient, as the error, r, is not necessarily homoskedastic. Another drawback is 
that this would not generate overidentifying restrictions. 

21.9. a. A histogram of the estimated propensity score for the untreated (train = 0) and 
treated (train = 1) cases is given below. There is a clear problem with overlap, as can be seen 
by studying the histogram for the control group: over 80% of units have propensity scores that 
are zero or practically zero. This means there are values of x where p(x) = 0 or is barely 
distinguishable from zero. 

The large differences in the histograms for the control and treatment groups spells trouble. 
Because p(x) is just a particular function of x, ideally its distribution would be similar across 
the control and treatment groups, and this is clearly not the case. The problems this causes is 
easily reasoned when thinking of matching on the propensity score. We need to find both 
control and treated units with similar values of p(x), but the histograms make it clear that there 


are very few in the control group with p(x;) >.5, whereas this is where the bulk of the 
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observations lie for the treated group. 
For comparison, the same histograms are plotted using the experimental data in 
JTRAIN2.RAW. Now the histograms are virtually indistinguishable and neither has mass at 


zero or one. 
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use jtrain3 


logit train age educ black hisp married re74 re75 


Logistic regression Number of obs = 2675 

LR chi2(7) = 872.82 

Prob > chi2 = 0.0000 

Log likelihood = -236.23799 Pseudo R2 = 0.6488 
train | Coef Std. Err Z P>|Z | [95% Conf. Interval 
i ome Se i Se a eh “at i ai +--------------------------------------------------------------- 

age | -.0840291 .014761 -5.69 0.000 - .1129601 - .055098 

educ | -.0624764 .0513973 -1.22 0.224 - .1632134 .0382605 

black | 2.242955 .3176941 7.06 0.000 1.620286 2.865624 

hisp | 2.094338 .5584561 3.75 0.000 .9997841 3.188892 

married | -1.588358 . 2602448 -6.10 0.000 -2.098428 -1.078287 

re74 | -.117043 .0293604 -3.99 0.000 - .1745882 - .0594977 

re75 | -.2577589 .0394991 -6.53 0.000 - . 3351758 - .1803421 

cons | 2.302714 .9112559 2.53 0.012 . 5166853 4.088743 


Note: 158 failures and © successes completely determined. 


predict phat 
(option pr assumed; Pr(train)) 


histogram phat, fraction by(train) 
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Fraction 


train = 0 train =1 


œ% 
E 
SI 
sy 
? a © 
0 2 4 6 8 1 0 2 4 6 8 1 
P(train|x) 
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use jtrain2 


logit train age educ black hisp married re74 re75 


Logistic regression Number of obs = 445 

LR chi2(7) = 8.58 

Prob > chi2 = 0.2840 

Log likelihood = -297.80826 Pseudo R2 = 0.0142 
train | Coef Std. Err Z P>|z | [95% Conf. Interval 
i ose i Se a eh ak i ai +--------------------------------------------------------------- 

age | -0107155 .014017 0.76 0.445 -.0167572 .0381882 

educ | .0628366 .0558026 1.13 0.260 - .0465346 .1722077 

black | -.3553063 . 3577202 -0.99 0.321 -1.056425 . 3458123 

hisp | -.9322569 . 5001292 -1.86 0.062 -1.912492 .0479784 

married | . 1440193 . 2734583 0.53 0.598 - .3919492 .6799878 

re74 | -.0221324 .0252097 -0.88 0.380 -.0715425 .0272777 

re75 | - 0459029 .0429705 1.07 0.285 - .0383177 . 1301235 

cons | -.9237055 . 7693924 -1.20 0.230 -2.431687 . 5842759 


predict phat 
(option pr assumed; Pr(train)) 


histogram phat, fraction by(train) 
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Fraction a ae 


SJ 


2 A 6 8 2 A 6 
P(train|x) 
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b. Using the sample restricted to avgre < 15 helps a little in that there now seem to be at 
least some untreated units in bins with p(x;) >.3. But there are not many. The pile-up at near 
zero for the control group is still present. The two histograms still look very different from the 


experimental data in JTRAIN2.RAW. 
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train = 0 train =1 


Fraction ® 
© 
+ 
N 
oO 
0) 2 4 6 8 1 O .2 4 6 
P(train|x) 
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c. (Bonus Part) Suppose that using all 2,675 observations that, after estimating the logit 
model for train, we drop all data with p(x;) <.05. We then reestimate the logit model using the 
remaining observations. How many observations are left? Obtain the resulting histograms as in 
part a and part b. 

Solution 

The Stata session is given below. Only 422 observations are left after dropping those with 
Ê(x:) <.05. The histograms look much better in terms of overlap: for the most part, it appears 
that for p(x;) within a given bin, there a both treated and untreated observations. But the skew 


of the distributions is completely different (and not too suprising). 
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use jtrain3 
. qui logit train age educ black hisp married re74 re75 


. predict phat 
(option pr assumed; Pr(train) ) 


. drop if phat < .05 
(2253 observations deleted) 


. drop phat 
. qui logit train age educ black hisp married re74 re75 


. predict phat 
(option pr assumed; Pr(train) ) 


histogram phat, fraction by(train) 


516 


Fraction 


3 


1 0 
P(train|x) 
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21.10. a. This is just problem in the asymptotic theory of simple regression with a binary 
explanatory variable. With y; = Ho + Tw; + Vio we have that w; is independent of v;o, and so 


there is no heteroskedasticity. It follows that (see Theorem 4.2) 


aver meso = Yar = Sora 


because P(w; = 1) = p. This means, by definition, 


Var(vi0) 


Avar(T) = Np — p) ` 


b. By definition of the linear projection we can write 


Yio = Qo + XiP, + Ui 


E(uio) = 0, E(xiujo) = 0. 
Now we just plug this into y; = yio + Twi: 
Vi = Qo + XP. + Uio + TW; = Ao + TW; + XiP, + uio. 
The problem says to assume that w; is independent of (yi0, X;) and so w; is actually independent 
of (x;, uio) [because u;o is a function of (vio, x;)]. 
c. Let z; = (wi, x;) be the set of nonconstant regressors and let y = (T, Bi)’. Then, as we 


showed in Chapter 4, if y is the OLS estimator under random sampling, 


N 
SNQ -y) = [Vara] N" a — p,)'uio + 0p(1). 


i=1 


Given that Cov(x;,w;) = 0, we we restrict attention to the first element, /N (7 — t), we get 


N 
VN (@ =) = [Var(wi)} N $ (w: = p)'uo + op (1). 


i=1 


Therefore, using independence between uio and wi, 
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seo] ie = e 


and so 


Var(uio) 
Np- p) 


Avar(t) = 
d. Because y;o = Uo + Vio, Var(vio) = Var(yio). Using the linear projection representation 
Yio = Qo + Xißo + uo, 
Var(vio) = Var(xiB,) + Var(uio) 
and, assuming that Var(x;) has full rank, Var(x;B,) > 0 whenever B, + 0. So 
Var(uio) < Var(vio) = Var(vio). 
It follows by comparing Avar(t) and Avar(T) that Avar(z) < Avar(7) whenever B, + 0, that 
is, whenever x; is correlated with yj. 
e. Even though 7 is asymptotically more efficient than 7, 7 is generally biased if 
E(vio|Xi) + ao + XiBp. [If E(Qvio|xi) = ao + x;B, then t would be conditionally unbiased, that is, 
E(t|W, X) = t.] The difference-in-means estimator T is unbiased conditional on W because 
E(y;|W) = EQilwi) = Ho + Twi. 
21.11. Suppose that we allow full slope, as well as intercept, heterogeneity in a linear 


representation of two counterfactual outcomes, 


Yio = Ain + xibio 


Ya = an +X;bä 
Assume that the vector (x;,z;) is independent of (aio, bio, ai1, ba) — which makes, as we will 
see, Z; instrumental variables candidtates in a control function or correction function setting. 


a. Because x; is independent of big, g = 0, 1, 
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Eig) = Eig) + E(xibig) = ag + E(xi)E(big) 
= @g + WB,, g = 0,1. 


b. From part a, 


Ho = ao + ypo 
Hi = &ı + yÊ; 
and so 
T = (@ı — ao) + W(B, — Bo) = (a1 — ao) + yò. 
Also, 
Yig = dg +XxiB, + Cig + Xif ig Z =0,1 
and so 


yi = (1 - wi) yio twain 
= (1 = wi)(@o } XiB, + Cio Xf o) } wi(a +xiB, + Ci + xif,,) 


= do + (41 — G0) wi + iB + wixi(B, — Bo) 


+ Cio +wil(cn —co) + Xf + wixilt, — fio) 


Ao 4 (a, Qo)Wi t XiB, t W;x;0 + Cio + W;e; 4 Xf o + w;X;d;. 
Next, substitute 


do = Ho — WB, 


dı — do = T — yÒ 


to get 


Yi = Ho — WB, + (t — wd)w; + xiB, + w;X;ð + Cio + Wi€i 4 Xif o + w;X;d; 


= Ho + Twi + (Xi — WB, + Wi(Xi — W)b + Cio + Xifj9 + Wie + Wixidi, 


which is what we wanted to show. 


c. So that there is no notational conflict, write E(di|ai,x;,z:) = Ca; and keep @ to index the 
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binary response model. Now take the expectation of (21.149) conditional on (aj, x;,z;), using 


the fact that w; is a function of (a;,x;,Z;): 


Eilai, Xi, Zi) = Ho + Twi + (Xi — WB, + w: — wd 


+ E(ciolai Xi, Zi) + XiE(fig|ai, Xi, Zi) 


+ wiE(e;|ai, Xi, Zi) + wixiE(dj\a;, Xi, Zi) 
= Ho + Twi + (Xi — WB, + wi(xi — yd 


+ pođi + aiXMo + Widi + wiaixiG 


d. Just use iterated expectations along with 
E(aj|wi,q;) = h(wi,q;8) = wid(q;8) - (1 — wi)A(-q,8). 


So 


E(y;|wi, Xi, Zi) = Ho + TW; 4 (x; WB, t wi(x; y)d 
+ poh(wi,q;8) + A(wi, q;8)x Mo + Ewih(wi, q;0) + wih(wi,g;9)x 5 


e. Given E(y;|w;, Xi, Zi), the CF method is straightforward. In the first step, estimate probit 

of w; on q; to get 6, and then compute fi; = h(wi,q 6). Then run the OLS regression 
y; on 1, wi, x; — X, wi (x; — X), hi, hix:, wihi, wihix;, i = 1,...,N. 

We replace y with the sample average, x. The coefficient on w; is T. 

Compared with the regression in equation (21.85), we have included the interactions /;x 
and w;/;x;. These account for the random coefficients in the counterfactual equations. 

f. Of course we could work through the delta method to obtain a valid asymptotic standard 
error for 7, but bootstrapping both steps in the procedure provides a simple alternative. 


g. We just compute E(yie\x) for g = 0,1: 
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E(viglXi) = E(aig + xibig|xi) 
= E(aig|x;) + X;E(Die|x;) 


where the last equality holds by the independence assumption. Therefore, 


Tate(X) = (1 — ao) + x(B, — By) 
T-W6+xd = T + (x -— y)ò. 


So 
îuelx) = 7+ (x-3). 
21.12. a. The terms cj and x;f; have zero means conditional on (x;,z;) by the 
independence of all heterogeneity terms — (aio, bio, a1, bi) — and (x;,z;). Remember, 
Co = aio — Ao and fo = bio — Bo- 
b. The correction functions in this case are E(w;e;|x;,z;) and 
E(w;x;di|x;,Z;) = x;E(wid;|x;,z;). Now we just use the formula in equation (21.80) because 


E(e;|ai, Xi, Zi) = Čdi and E(d;ļa;, Xi, Zi) = ai. Therefore, 


E(w;e:lx:,z:) = €6(q,8) 
E(w:x:d:|x;,Z:) = xi0(q;8)G = 6(q,8)x:6. 


c. From part b we can write 


Yi = Ho + Twi + (Xi — W)By + w:x; — W)d + čo (q;0) + O(G,8)xiC + cio + Xifig + i 
where 
ri = [wie; = E(w;e:|Xi, Zi) | + [w;x;d; = E(w;x;dj|x;, Z;) | 


and so E(r;|x;,z;) = 0. The estimating equation, after the first-stage probit to get 6, is 


Yi = Mo + twit (xi — X)B, + wilxs — 5)8 + EG(q,6) + O(q,6)xi6 + errori 
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which we can estimate using IV with instruments, say, 
[1, @;, (x; —&), Ô; + (x; — 8), 91,67 xi] 

d. Under the null hypothesis, € = 0 and ¢ = 0. The conditions sufficient to ignore the 
first-stage estimator for IV estimators in Chapter 6 hold here, so we can use a standard Wald 
test (perhaps made robust to heteroskedasticity) to test joint significance of the K + 1 variables 
(¢ 2 ĝi - x;). Remember, these are acting as their own instruments in the estimation. 

21.13. Fractional probit or logit are natural, or some other model that keeps the fitted 
values in the unit interval. For the treatment rule w; = 1[x; > c], let G(&o + Box) be the 
estimated fractional response model using the data with x; < c and let G(@, + Bix) be the 
estimated model using the data with x; > c. Probably the Bernoulli QMLE would be used. 


Then similar to equation (21.104), 


îe = G(a1 + Bic) — G(Go + Boc). 


The delta method or bootstrapping can be used to obtain valid inference for 


Te = G(ai + Bic) — Glao + Boc). 


21.14. a. Just take the expected value of vis(g) = aig + xiiB,: 
Hgt = E(aig) + E(xir)B, = Aig + YBa g=0,1. 
Therefore, for each ¢, 


Trate = (An — A) + wv (B, = Bo). 


b. Use a0 = Lo — WB, and (an — ao) = t: -yB — Bo) and plug into the equation for 
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yi = A—-wi)(ao 4 XiiBo + Cio) t+wila + XiB, + Cin) 


= do + Wilda — Ao) + XitBy + WiXi(B, — Bo) + Cito + Wit(Cin — Cio) 


= (uo -y Bo) + Wilt: w(B, Bo) ] H 


t XiiBy + wirir(B, = Bo) + Cio + Wit(Cin — Cio) 


= Mot Twit + (Kit — VW) Bo + Wi Xie — Y, )Ò + Cito +Wiei 
where 6 = B, — By and ex = cin — Cin. 


c. Plugging in for ci and ei gives 


Vit = Lo + Twit t (Xi WV Bo + Wit(Xir —y,)6 


+ (X; T HS, T (Z; z H;)5> +i + wil (Xi 7 uM; ZE (Z; = HN) + Vir] 


= Uo + TiWi + (Xi — VB + Wit(Xit — v6 
+ (Xi — Wz )§, + (Zi — pz) + Wili — Wy), + Wi(Zi — yM, 


+ Fio + WitVit 


d. As usual, we first condition on (gir, Xi, Zi): 


Eiddin Xi,Zi) = Lio + Twit + (Xi WB H Wi(Xi — W,)6 


+ E(riolq it, Xi, Zi) + WtE (Vil it, Xi, Zi) 


= Ho + Twit + (Xi v)Bo Wi(Xir — YW) 


+ Aogit + PWit it. 


Now condition on (wis Xi, Zi) using iterated expectations: 


EQ ulWin Xi, Zi) = Ho + TiWi + (Xi — Y, Bo + WiKi — y, 


+ aoh(wi, g0) + pwih(wi, 8,0), 
where 


h(wit, 8,9) = wird(g,8) — (1 - wit)A(-g,,) 
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+ (Ki — Wy)S, + Zi — Mg)Go + Wie Ki — By), + wilZi — uM 


+ (i —pE, + Zi- py )E, + wili — BN, + walZi — WN, 


+ (X; — Wy), + (Zi — p3), + wiki — By), + WilZi — BN, 


and g 0 = 6004 x01 t ZitO2 t x03 t Z;04. 
e. In the first step we can use pooled probit of wi, on 1, xi, Zin Xi, Z; to obtain 6 and 


h(wi g 0): Then we can use pooled OLS in a second step: 


yu on 1, a2), ..., dTi, wit, d2 wit, .... UT;wit, (Xit — Xt), Wit * Xu — Xe), 
(X; — F), (Z-Z), wie (Ki — 5), wie Zi — Z), Nit, wie © hit 

where dr; is a time dummy for period r, hi, = h(wig )s and overbars denote sample 
averages. 

Note that in the conditional expectation underlying the CF method, E(v iwi, Xi, Zi), 
{wi : t = 1,..., T> is not guaranteed to be strictly exogenous. Therefore, we cannot use 
GLS-type methods without making extra assumptions. 

21.15. a. The Stata output is given below. There are 5,735 observations, and 2,184 received 


a right-heart catheterization. 


tab rhc 

=1 if | 
received | 
right heart | 
catheteriza | 

tion | Freq Percent Cum 

O | 3,551 61.92 61.92 

1 | 2,184 38.08 100.00 

Total | 5,735 100.00 


b. The Stata output, using simple regression and heteroskedasticity-robust standard errors, 
is given below. According to the estimate, people receiving an RHR have a .051 higher 
probability of dying. The estimate has a robust ¢ statistic of 3.95, so it is statistically different 


from zero (and practically large — in the “wrong” direction). 
reg death rhc, robust 


Linear regression Number of obs 
F( 1, 5733) 


5735 
15.56 
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Prob > F = 0.0001 
R-squared = 0.0027 
Root MSE = .47673 
| Robust 
death | Coef. Std. Err. t P>|t | [95% Conf. Interval 
Gahar ec, a Saat, es +--------------------------------------------------------------- 
rhc | .0507212 .0128566 3.95 0.000 .0255174 .0759249 
_cons | .6296818 .0081049 77.69 0.000 .6137931 .6455705 


c. The Stata code and results are given below. We still obtain counterintuitive results: 
Îatereg =. 078, with a bootstrapped standard error of .013, and ĉarreg =. 066 (se =. 014). Thus, 
both estimates are statistically different from zero. we can either conclude that the controls we 
include do not make treatment assignment ignorable or that the RHC actually increases the 


probability of death. 

clear all 

capture program drop ateboot 
program ateboot, eclass 


* Estimate logit on treatment and control groups separately 

tempvar touse 

gen byte ‘touse’ = 1 

xi: logit death i.female i.race i.income i.cat1 i.cat2 i.ninsclas age if rhc 
predict dihat 

xi: logit death i.female i.race i.income i.cat1 i.cat2 i.ninsclas age if ~rhc 
predict dOhat 

gen diff = dihat - dOhat 

sum diff 

scalar ate = r(mean) 

sum diff if rhc 

scalar att = r(mean) 

matrix b = (ate, att) 

matrix colnames b = ate att 

ereturn post b , esample(‘touse’ ) 

ereturn display 

drop dihat dOhat diff _I* 

end 


use catheter 

bootstrap _b[ate] _b[att], reps(1000) seed(123): ateboot 
program drop ateboot 

do catheter_reg 


use catheter 
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bootstrap _b[ate] _b[att], 


(running ateboot on estimation sample) 


Bootstrap replications (1000 


reps(1000) seed(123): ateboot 


) 


Shae 1 a N eet et Ay eta eS 


Bootstrap results 


command: ateboot 
_bs_1: _bf[ate] 
_bs_2: _bf[att] 
| Observed 
| Coef. 
ee a ey i hl Sk aS pe! A + 
_bs_1 | .0776176 
bs 2 | - 0656444 


Bootstrap 
Std. Err. 


.0129611 
01366 


50 
1000 
Number of obs = 5735 
Replications 1000 
Normal-based 
P>|Z| [95% Conf. Interval 
0.000 .0522143 . 1030208 
0.000 .0388713 .0924175 


program drop ateboot 


end of do-file 


d. The average p; for the treated group is about . 445 and it ranges from about . 085 to . 737. 


For the control group, the numbers are .341, .044, and .738. Though the mean propensity 


score is somewhat higher for the treated group, the ranges are comparable. The two histograms 


show that for both groups the probabilities stay away from the extremes of zero and one, and 


for every bin representing intervals of the estimated propensity score, there are several 


individuals in the control and treatment groups. Overlap appears to be good. 


use catheter 


i.female _Ifemale_o 
i.race _Irace_0-2 
i.income _ITincome_0 
i.cati _Icat1_1-9 
i.cat2 _Icat2_1-7 
i.ninsclas _Ininsclas 


Logistic regression 


Log likelihood = -3497.9617 


-1 


-3 


_1-6 


(naturally 
(naturally 
(naturally 
(naturally 
(naturally 
(naturally 
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xi: logit rhc i.female i.race i.income i.cat1 i.cat2 i.ninsclas age 


coded; _Ifemale_© omitted) 


coded; _Irace_© omitted) 
coded; _Iincome_O omitted) 
coded; _Icati_1 omitted) 
coded; _Icat2_1 omitted) 
coded; _Ininsclas_1 omitted) 
Number of obs = 5735 
LR chi2(26) = 625.48 
Prob > chi2 = 0.0000 
Pseudo R2 = 0.0821 


rhc | Coef Std. Err Z P>|z | [95% Conf. Interval 
ean a te pe ht a a +--------------------------------------------------------------- 
_Ifemale_1 | . 1630495 .0587579 2.77 0.006 .0478861 .2182129 
_Irace_1 | .0424279 .0827591 0.51 0.608 -.1197771 . 2046328 
_Irace_2 | . 0393684 . 1361998 0.29 0.773 - .2275784 . 3063151 
_Iincome_1 | . 0443619 .0762726 0.58 0.561 - .1051296 . 1938535 
_Iincome_2 | .151793 .0892757 1.70 0.089 - .0231841 .3267701 
_Iincome_3 | .1579471 .1140752 1.38 0.166 - .0656361 . 3815303 
_Icat1i_2 | . 498032 . 107388 4.64 0.000 . 2875553 . 7085086 
_Icati_3 | -1.226306 . 1495545 -8.20 0.000 -1.519428 - .9331849 
_Icati_4 | -.7173791 .1714465 -4.18 0.000 -1.053408 - .3813501 
_Icati_5 | -1.002513 1.085305 -0.92 0.356 -3.129671 1.124645 
_Icat1_6 | -.6941957 .1260198 -5.51 0.000 -.94119 -.4472013 
_Icati_7 | -1.258815 .4833701 -2.60 0.009 -2.206203 - .3114273 
_Icat1_8 | -.2076635 .1177652 -1.76 0.078 - .438479 .0231519 
_Icat1_9 | 1.003787 .0768436 13.06 0.000 .8531766 1.154398 
_Icat2_2 | . 9804654 1.465085 0.67 0.503 -1.891048 3.851979 
_Icat2_3 | -.4141065 -4428411 -0.94 0.350 -1.282059 . 4538461 
_Icat2_4 | -.8864827 .8454718 -1.05 0.294 -2.543577 . 7706116 
_Icat2_5 | - .195389 . 3933026 -0.50 0.619 - .966248 .57547 
_Icat2_6 | 1.034498 . 369503 2.80 0.005 . 3102859 1.758711 
_Icat2_7 | . 1415088 . 3649828 0.39 0.698 - . 5738443 . 8568619 
_Ininsclas_2 | . 1849583 .1216214 1.52 0.128 - .0534153 . 4233318 
_Ininsclas_3 | . 1082916 .152243 0.71 0.477 - .1900992 . 4066824 
_Ininsclas_4 | .5216726 . 1495659 3.49 0.000 . 2285288 . 8148164 
_Ininsclas_5 | .468176 .1122184 4.17 0.000 . 248232 .6881199 
_Ininsclas_6 | . 3742273 . 1249122 3.00 0.003 . 1294038 .6190508 
age | .0006419 .002252 0.29 0.776 - .0037719 .0050557 
cons | -1.36677 . 3834979 -3.56 0.000 -2.118412 - .6151284 
predict phat 
(option pr assumed; Pr(rhc)) 
sum phat if rhc 
Variable | Obs Mean Std. Dev Min Max 
ee a e eee ee +-------------------------------------------------------- 
phat | 2184 . 4449029 .1421669 -0851523 . 7369323 
sum phat if ~rhc 
Variable | Obs Mean Std. Dev Min Max 
br ca a eee 1p a +-------------------------------------------------------- 
phat | 3551 . 3414058 .1517016 -0435625 . 7379614 


528 


Fraction 


.08 


.06 


04 


.02 


8 0 
P(rhe = 1]x) 
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e. The estimates using propensity score weighting are very similar to the 
regression-adjustment estimates using logit models. The PSW estimates are T aepsw =.072 
(se =.013) and Tanpsy =.063 (se =.014). Unfortunately, out of the 1,000 bootstrap replications 
that I ran to obtain the standard errors, only 219 produced usable results because the estimated 
propensity score for at least some of the draws was identically zero or identically one (when 
the covariates perfectly classify rhc). But there is clearly no evidence, based on these and the 
regression-adjustment estimates, that RHC reduces the probability of death. It appears that 
RHC is being applied to patients based on variables are not included in the data set that are 


associated with both mortality and a doctor recommending RHC. 
do catheter_psw 
clear all 
capture program drop ateboot 
program ateboot, rclass 
* Estimate propensity score 
xi: logit rhc i.female i.race i.income i.cati i.cat2 i.ninsclas age 
predict phat 
gen kiate = (rhc - phat)*death/(phat*(1 - phat)) 
sum kiate 
return scalar atew = r(mean) 
sum rhc 
scalar rho = r(mean) 
gen kiatt = (rhc - phat)*death/(1 - phat) 
sum kiatt 
return scalar attw = r(mean)/rho 
drop phat kiate kiatt _I* 
end 
use catheter 
bootstrap r(atew) r(attw), reps(1000) seed(123): ateboot 


program drop ateboot 


use catheter 


bootstrap r(atew) r(attw), reps(1000) seed(123): ateboot 
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(running ateboot on estimation sample) 


Bootstrap results Number of obs = 5735 
Replications = 219 


command: ateboot 


_bs_1:  r(atew) 
_bs_2: r(attw) 


| Observed Bootstrap Normal-based 
| Coef. Std. Err. Zz P>|z | [95% Conf. Interval 
see a ae = ll +--------------------------------------------------------------- 
_bs_1 | .071762 .0131705 5.45 0.000 .0459483 .0975758 
bs_2 | .0629458 .0140839 4.47 0.000 . 035342 .0905497 


Note: one or more parameters could not be estimated in 781 bootstrap 
replicates; standard-error estimates include only complete replications 


program drop ateboot 


end of do-file 

21.16. Use the data in REGDISC.RAW to answer this question. These are simulated data 
of a fuzzy regression discontinuity design with forcing variable x. The discontinuity is at 
x=5. 

a. Exactly half of the observations have x; > 5, but 58.1% (1,162 out of 2,000) of the 


observations are in the treatment group. 


sum Z w 
Variable | Obs Mean Std. Dev. Min Max 
Soe a aay a ew E E p Fe a T A +-------------------------------------------------------- 
z | 2000 9 .500125 0 1 
w | 2000 581 . 4935188 0 1 
tab w 
=1 if | 
treated | Freq. Percent Cum. 
O | 838 41.90 41.90 
1| 1,162 58.10 100.00 
Total | 2,000 100.00 


b. The Stata output is given below. The graphs of the estimated probabilities (LPM and 


logit) are fairly similar, although the LPM estimates a larger jump in the treatment 
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probabilities: for the LPM it is about .271 and for the logit it is about . 192. In fact, the logit 


estimate is closer to the true jump in the propensity score at x = 5, which is about . 186. [The 


treatment probability was generated from the probit model 


P(w = 1x) = O(.14.5 + 1[x >= 5] +.3 - (œ —5)), and so the jump in the probability of 


treatment at x = 5 is B(.6) — O(.1) ~.186.] 


. 1099499 
.0967917 


. 0596095 


reg w xX if -~-z 
Source | SS df MS Number of obs 
ee rer eee F( 1, 998) 
Model | 17.5177744 1 17.5177744 Prob > F 
Residual | 180.953226 998 „181315857 R-squared 
-------------+------------------------------ Adj R-squared 
Total | 198.471 999 . 19866967 Root MSE 
w | Coef Std. Err t P>|t | [95% Conf. 
je at Sie rea ea ee a a +--------------------------------------------------------------- 
x | .0916522 . 0093244 9.83 0.000 .0733545 
cons | .043984 .0269105 1.63 0.102 - .0088237 
predict wh@_lpm 
(option xb assumed; fitted values) 
gen wh@_lpm_5 = _b[_cons] + _b[x]*5 in 1 
(1999 missing values generated) 
regwxifz 
Source | SS df MS Number of obs 
ee ree F( 1, 998) 
Model | 4.4914493 1 4.4914493 Prob > F 
Residual | 94.1875507 998 .094376303 R-squared 
-------------+------------------------------ Adj R-squared 
Total | 98.679 999 .098777778 Root MSE 
w | Coef Std. Err. t P>|t | 
ee ete cs ee J +--------------------------------------------------------------- 
x | . 0464084 .0067272 6.90 0.000 0332073 
cons | . 5408787 .0513891 10.53 0.000 - 4400356 


.6417218 


predict wh1_lpm 
(option xb assumed; fitted values) 


gen whi_lpm_5 = 


b[_cons] + _b[x]*5 in 1 
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(1999 missing values generated) 


. gen jump_lpm = whi_lpm_5 - wh0_lpm_5 
(1999 missing values generated) 


. gen what = whatO if ~z 
(1000 missing values generated) 


replace what = what1 if z 
(1000 real changes made) 


. sum jump_lpm 


Variable | Obs Mean Std. Dev. Min 


ear ee ae ig eet ty es +----------------------------------------------- 


jump_lpm | 1 .2706757 l .2706757 
. qui logit w x if ~z 


. predict wh0_logit 
(option pr assumed; Pr(w)) 


. gen wh@_logit_5 = invlogit(_b[_cons] + _b[x]*5) in 1 
(1999 missing values generated) 


. qui logit w x if z 


. predict wh1_logit 
(option pr assumed; Pr(w)) 


. gen whi_logit_5 = invlogit(_b[_cons] + _b[x]*5) in 1 
(1999 missing values generated) 


. gen jump_logit = whi_logit_5 - whO_logit_5 
(1999 missing values generated) 


. sum jump_logit 


Variable | Obs Mean Std. Dev. Min 


i iar, ee eS a ag a ee a +----------------------------------------------- 


jump_logit | 1 .1922199 . 1922199 


. gen wh_logit = whO_logit if ~z 
(1000 missing values generated) 


. replace wh_logit = whi_logit if z 
(1000 real changes made) 
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.2706757 


.1922199 


. twoway (line wh_lpm x, sort) (line wh_logit x, sort) 


Estimated Propensity Scores 


Logit 


534 


c. We already computed the jump for the LPM estimates of the propensity score in part b. 


Now we need to estimate the jump in E()|x) at x = 5. Using the linear model, this turns out to 


be about .531. From equaton (21.107) the estimate of Te is the ratio of the jumps, which is 


about 1.96. In fact, the true effect is two, so this estimate is very close for this set of data. The 


data on y were generated as y; = 1 + 2w; + x;/4 + u; where u; is independent of x; and 


treatment with a Normal(0, .36) distribution. 
reg y x if -~-z 


Source | SS df MS 


Model | 409.736838 1 409.736838 
Residual | 1109.52773 998 1.11175123 


Total | 1519.26457 999 1.52078535 


Number of obs 
F( 1, 998) 
Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


. 3979488 


. 4885664 


y | Coef Std. Err t P>|t | 

ee i ae Sa a le Dee +--------------------------------------------------------------- 
x | . 4432576 .0230891 19.20 0.000 
_cons | 1.066599 . 0666359 16.01 0.000 


. 9358364 


1.197361 


gen yh0_5 = _b[_cons] + _b[x]*5 in 1 
(1999 missing values generated) 


reg y X if z 


Source | SS df MS 


Model | 230.759468 1 230.759468 
Residual | 742.020178 998 .743507193 


Total | 972.779646 999 .973753399 


Number of obs = 
F( 1, 998) = 


Prob > F 
R-squared 


Adj R-squared = 


Root MSE 


. 295594 


. 3696997 


y | Coef Std. Err t P>|t | 

sl a Sa a ly et Sat ln pea el i a +--------------------------------------------------------------- 
x | . 3326468 .0188819 17.62 0.000 
_cons | 2.151037 . 1442388 14.91 0.000 


1.86799 


2.434083 


. gen yhi_5 = _b[_cons] + _b[x]*5 in 1 
(1999 missing values generated) 


. gen jumpy = yhi_5 - yhO_5 
(1999 missing values generated) 


sum jumpy 
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Variable | Obs Mean Std. Dev. 


jumpy | 1 . 531384 


gen ate5 = jumpy/jump_lpm 
(1999 missing values generated) 


Min Max 

. 531384 .531384 
Min Max 
1.963176 1.963176 


sum ate5 
Variable | Obs Mean Std. Dev. 
sm a ky ps a le a etl +-------------------------------------------------------- 
ate5 | 1 1.963176 


d. As is claimed in the text, the IV estimate from (21.108) is the same as the estimate from 


(21.107), with a very slight rounding error in the sixth digit after the decimal point. The 


heteroskedasticity-robust standard error is about . 205. (The nonrobust standard error is about 


197.) 


Because the true equation for y is linear in x — with an expected jump at x = 5 — the IV 


estimator (and hence the estimate from part c) is consistent even though w; follows a logit 


model rather than an LPM. 
ivreg y x_5 zx_5 (w = z), robust 


Instrumental variables (2SLS) regression 


Number of obs = 2000 
F( 3, 1996) = 3588.42 
Prob > F = 0.0000 
R-squared = 0.8722 
Root MSE = 5959 


1.56175 
. 2054354 
- .0638729 
2.031688 


2.364604 
. 3212206 
0202947 
2.562093 


| Robust 
y | Coef Std. Err t 
Van se a a! ne St +--------------------------------------------------------------- 
w | 1.963177 .2046892 9.59 
x_5 | . 263328 .0295197 8.92 
zx_5 | -.0217891 .0214587 -1.02 
cons | 2.29689 .1352279 16.99 


Instrumented: w 
Instruments: x_5 zx_5 Z 


e. Using only the data with 3 < x; < 7 results in 800 observations, rather than 2,000. The 


estimate is substantially smaller than two but, more importantly, its standard error has 
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increased to about .327 from .205. Nevertheless, the 95% confidence interval for te easily 


contains the true value t, = 2. 


ivreg y x_5 zx_5 (w= 2) if x >38&x</7, 


Instrumental variables (2SLS) regression 


robust 


Number of obs = 


F( 3, 
Prob > F 
R-squared 
Root MSE 


796) 


y | Coef 
w | 1.775465 
x_5 | . 3471895 
zx_5 | -.0991082 
cons | 2.442008 


Robust 


. 3267695 
.0726118 
.0772654 
2182725 


1.134033 
. 2046563 
- .2507762 
2.01355 


2.416897 
. 4897226 
.0525599 
2.870466 


Instrumented: w 
Instruments: x_5 zx_5 Z 
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Solutions to Chapter 22 Problems 

22.1. a. In Stata, there are two possibilities for estimating a lognormal duration model: the 
cnreg command (where we use the log of the duration as a response), and the streg 
command (where we specify “lognormal” as the distribution). The st reg command is more 
flexible (and I use it in the next problem), but here I give the cnreg output. The value of the 


log likelihood is —1, 597. 06. 
use recid 


cnreg ldurat workprg priors tserved felon alcohol drugs black married educ 
age, censored(cens) 


Censored-normal regression Number of obs = 1445 
LR chi2(10) = 166.74 
Prob > chi2 = 0.0000 
Log likelihood = -1597.059 Pseudo R2 = 0.0496 
ldurat | Coef Std. Err t P>|t | [95% Conf. Interval 
se ee ci a Shee ee «ee +--------------------------------------------------------------- 
workprg | -.0625715 . 1200369 -0.52 0.602 - .2980382 .1728951 
priors | -.1372529 .0214587 -6.40 0.000 - .1793466 - .0951592 
tserved | -.0193305 .0029779 -6.49 0.000 -.0251721 - .013489 
felon | . 4439947 . 1450865 3.06 0.002 . 1593903 . 7285991 
alcohol | -.6349092 . 1442166 -4.40 0.000 -.9178072 - .3520113 
drugs | -.2981602 .1327355 -2.25 0.025 - . 5585367 - .0377837 
black | -.5427179 .1174428 -4.62 0.000 - . 7730958 - .31234 
married | . 3406837 . 1398431 2.44 0.015 .066365 .6150024 
educ | .0229196 .0253974 0.90 0.367 - .0269004 .0727395 
age | . 0039103 . 0006062 6.45 0.000 .0027211 . 0050994 
cons | 4.099386 . 347535 11.80 0.000 3.417655 4.781117 
salsa ils St em a eh “ete +--------------------------------------------------------------- 
/sigma | 1.81047 0623022 1.688257 1.932683 
Observation summary: © left-censored observations 
552 uncensored observations 


893 right-censored observations 
b. I graphed the hazard at the stated values of covariates using the Stata commands below. 
The estimated hazard initially increases, until about ¢* = 4.6, where it reaches the value of 
.0116 (roughly). It then falls, until it hits about about . 005 at t = 81. It may make sense that 


there are startup costs to becoming involved in crime upon release, so that the instantaneous 
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probability of recidivism initially increases (for about four and one-half months). After that, 
the hazard falls monotonically, although it does not become zero at the largest observed 


duration, 81 months. 


. di = _b[_cons] + _b[felon] + _b[alcohol] + _b[drugs] + _b[priors]* 1.431834 
+ _b[tserved]*19.18201 +_b[educ]* 9.702422 +_b[age]* 345.436 


4.616118 
. Clear 


range t .1 81 5000 
obs was ©, now 5000 


. gen hazard = (normalden((log(t) - 4.62)/1.81)/ 
(1 - normal((log(t) - 4.62)/1.81)))/(1.81*t) 


egen maxhazard = max(hazard) 


. list t hazard if hazard >= maxhazard 


| 
277. | 4.566573 011625 | 


twoway (line hazard t) 
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c. Using the only the uncensored in a linear regression analysis provides very different 
estimates. For example, the alcohol and drugs coefficients are much smaller in magnitude, 


with the latter actually changing sign and becoming very insignificant. 


reg ldurat workprg priors tserved felon alcohol drugs black married educ age 


Source | SS df MS Number of obs = 552 

Bae Resin ae oe eee Sahat See Cia eens a iS F( 10, 541) = 4.13 
Model | 33.7647818 10 3.37647818 Prob > F = 0.0000 
Residual | 442.796158 541 .818477187 R-squared = 0.0709 
-------------+------------------------------ Adj R-squared = 0.0537 
Total | 476.56094 551 .864901888 Root MSE = . 9047 

ldurat | Coef Std. Err t P>|t | [95% Conf. Interval 

sens a, ph a el “el ai +--------------------------------------------------------------- 
workprg | .0923415 .0827407 1.12 0.265 - .0701909 . 254874 
priors | -.0483627 .0140418 -3.44 0.001 - .0759459 -.0207795 
tserved | -.0067761 .001938 -3.50 0.001 - .010583 - .0029692 
felon | .1187173 . 103206 1.15 0.251 - .0840163 . 3214509 
alcohol | -.2180496 .0970583 -2.25 0.025 - .408707 - .0273923 
drugs | .0177737 . 0891098 0.20 0.842 - .1572699 . 1928172 
black | -.0008505 0822071 -0.01 0.992 - . 1623348 . 1606338 
married | 2388998 0987305 2.42 0.016 .0449577 . 432842 
educ | -.0194548 0189254 -1.03 0.304 - .0566312 .0177215 
age | 0005345 .0004228 1.26 0.207 - .000296 .0013651 
cons | 3.001025 2438418 12.31 0.000 2.522032 3.480017 


d. Treating the censored durations as if they are uncensored also gives very different 
estimates from the censored regression. Again, the estimated alcohol and drug effects are 
attenuated toward zero, although not as much as when we drop all of the censored 


observations. In any case, we should use censored regression analysis. 


reg ldurat workprg priors tserved felon alcohol drugs black married educ age 


Source | SS df MS Number of obs = 1445 
ee ner F( 10, 1434) = 17.49 
Model | 134.350088 10 13.4350088 Prob > F = 0.0000 
Residual | 1101.29155 1434 .767985737 R-squared = 0.1087 
-------------+------------------------------ Adj R-squared = 0.1025 
Total | 1235.64163 1444 .855707503 Root MSE = 87635 

ldurat | Coef Std. Err. t P>|t | [95% Conf. Interval 

Gahar an a a el et i, es +--------------------------------------------------------------- 
workprg | .008758 .0489457 0.18 0.858 - .0872548 .1047709 
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priors | -.0590636 .0091717 -6.44 0.000 -.077055 - .0410722 
tserved | -.0094002 . 0013006 -7.23 0.000 - .0119516 - .0068488 
felon | . 1785428 .0584077 3.06 0.002 . 0639691 . 2931165 
alcohol | -.2628009 . 0598092 -4.39 0.000 - . 3801238 - .1454779 
drugs | -.0907441 0549372 -1.65 0.099 - .19851 .0170217 
black | -.1791014 .0474354 -3.78 0.000 - .2721516 - .0860511 
married | . 1344326 . 0554341 2.43 0.015 .025692 . 2431732 
educ | .0053914 .0099256 0.54 0.587 - .0140789 0248618 
age | .0013258 0002249 5.90 0.000 . 0008847 . 0017669 
cons | 3.569168 .137962 25.87 0.000 3.298539 3.839797 


22.2. a. For this question, I use the streg command. The nohr option means that the B j 
that estimate the £; iin equation (22.25) are reported, rather than exp( B j). Whether or not a 
release was “supervised” has no discernible effect on the hazard, whereas, not surprisingly, a 


history of rules violation while in prison does increase the recidivism hazard. 
use recid.dta 
gen failed = ~cens 
stset durat, failure(failed) 


failure event: failed != 0 & failed < 
obs. time interval: (0, durat] 
exit on or before: failure 


1445 total obs. 
© exclusions 
1445 obs. remaining, representing 
552 failures in single record/single failure data 
80013 total analysis time at risk, at risk from t = (0) 
earliest observed entry t 
last observed exit t 81 
streg super rules workprg priors tserved felon alcohol drugs black married 
educ age, dist(weibull) nohr 


failure _d: failed 
analysis time _t: durat 


Weibull regression -- log relative-hazard form 
No. of subjects = 1445 Number of obS = 1445 
No. of failures = 552 
Time at risk = 80013 

LR chi2(12) = 170.51 
Log likelihood = -1630.517 Prob > chi2 = 0.0000 
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| 

| 
workprg | 
priors | 
tserved | 
felon | 
alcohol | 
drugs | 
black | 
married | 
educ | 

| 

| 


.0078523 
. 0386963 
. 1039345 

. 086349 
.0116506 
. 3111997 
.4510744 
. 2623752 

. 458454 
. 1563693 
.0246717 
. 0035167 
3.466394 


.0979703 
.0166936 
.0914158 
.0136871 
.001933 
.1074569 
. 1059953 
.0982732 
. 0884443 

. 10941 
.019442 
. 0005306 
. 3105515 


ONANAOARNO® 


. 1998705 
.0059774 
.0752371 
.0595227 
.0078621 
.5218114 
. 2433275 
.0697632 
. 2851063 
. 3708088 
.0627772 
. 0045567 
4.075064 


. 184166 
.0714151 
. 2831061 
. 1131752 
.0154392 

- . 1005879 
. 6588214 
. 4549872 
. 6318016 
. 0580703 
. 0134339 

- .0024767 

-2.857724 


.8071455 
1.238934 


. 0313826 
.0481709 


. 74179219 
1.148028 


.8710585 
1.337038 


b. The lognormal estimates are given below. The estimated coefficients on super and rules 


are consistent with the Weibull results because a decrease in x6 shifts up the hazard in the 


lognormal case. 


streg super rules workprg priors tserved felon alcohol drugs black 


educ age, dist(lognormal) 


married 


. 2838052 
- .0102703 
. 148453 
- .0919199 
- .0089628 
. 7208473 
- .38593851 
- .0185922 
- .3194189 
. 6050695 
.073058 


failure _d: failed 
analysis time _t: durat 
Lognormal regression -- accelerated failure-time form 
No. of subjects = 1445 Number of obs 
No. of failures = 552 
Time at risk = 80013 
LR chi2(12) 
Log likelihood = -1594.1683 Prob > chi2 
t| Coef Std. Err Z P>|z | [95% Conf. 
jet Fata pa i Fagin T +--------------------------------------------------------------- 
super | .0328411 . 1280452 0.26 0.798 - .2181229 
rules | -.0644316 .0276338 -2.33 0.020 - .1185929 
workprg | -.0883445 .1208173 -0.73 0.465 - .325142 
priors | -.1341294 .0215358 -6.23 0.000 - .1763388 
tserved | -.0156015 . 0033872 -4.61 0.000 - .0222403 
felon | . 4345115 . 1460924 2.97 0.003 .1481757 
alcohol | -.6415683 . 1439736 -4.46 0.000 - .9237515 
drugs | -.2785464 .1326321 -2.10 0.036 - . 5385007 
black | - .549173 1172236 -4.68 0.000 -.7789271 
married | . 3308228 . 1399244 2.36 0.018 .056576 
educ | .0234116 0253302 0.92 0.355 - .0262347 
age | . 0036626 . 0006117 5.99 0.000 0024637 
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.0048614 


_cons | 4.173851 . 3580214 11.66 0.000 3.472142 4.87556 


c. The coefficient from the lognormal model directly estimates the proportional effect of 
rules violations on the duration. So, one more rules violation reduces the estimated expected 
duration by about 6.4%. To obtain the comparable Weibull estimate, we need 
-Ê muleslâ = —.0387/.807 ~ —.048, or about a 4.8% reduction for each rules violation — a bit 
smaller than the lognormal estimate. 

22.3. a. If all durations in the sample are censored, d; = 0 for all 7, and so the log-likelihood 
is 0, log[1 - F(tilxi30)] = Xt logit - Fleix; 0)]. 

b. For the Weibull case, F(¢|x:;@) = 1 — exp[—exp(x;B)¢7], and so the log-likelihood is 
- See exp(x;B)c®. 

c. Without covariates, the Weibull log-likelihood with all observations censored is 
—exp(B) 2 c%. Because c; > 0, we can choose any a > 0 so that T c? > 0. But then, for 
any a > 0, the log-likelihood is maximized by minimizing exp({) across p. But as 
B > —œ,exp(ß) > 0. Therefore, plugging any value a into the log-likelihood will lead to 6 
getting more and more negative without bound. So no two real numbers for a and 6 maximize 
the log likelihood. 

d. It is not possible to estimate duration models from flow data when all durations are right 
censored. 

e. To have all durations censored in a large sample we would have to have P(t} > c;) very 


close to one. But if P(t* < t) > 0 for all t > 0 and c; > b > 0, 


P(t} > ci) < P(t? > b) = 1- PC <b), 
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and P(t; < b) > 0. So with large samples we should not expect to find that all durations have 
been censored. 
22.4. a. The binary response d; is equal to one if the observation is uncensored. Because ¢; 
is independent of c; conditional on x;, 
P(d; = 1|x;,c;) = P(t} < cix) = F(ci|x;; 9). 
Therefore, the log-likelihood is 
N 
> (diloglF(cilxi;®)] + (1 - di) log[1 — F(cilxi;8)]}, 
i=l 
which is just of the usual binary response form. 
b. When the distribution is Weibull and x; = 1, we have (from Problem 22.3c), 

F(c;|0) = 1 — exp[—exp()c#], and so the log-likelihood is 

N 

L(a,B) = $ {dilog[1 — exp[-exp(B)c#]] + (1 — di) log(exp[-exp(B)c#])}- 

i=l 
If c; = c > 0 for all 7, we have 

N 

L(a, p) = È {dilog[1 - exp[-exp(B)c*]] + (1 - dj) log(exp[-exp(B)c“))}. 

i=1 

If we define p = exp[—exp(f3)c%] then 0 < p < 1, and the log-likelihood can be written as 


N 
2 [d:log(o) + (1 - di) log(1 -= p)]. 
i=l 

In other words, the log-likelihood is the same for all combinations of æ and £ that give the 


same value of p. While we can consistently estimate p — the fraction of uncensored 


observations is the maximum likelihood estimator — we cannot recover estimates of a and p. In 
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other words, a and f are not identified. 
c. In the log-normal case, the log-likelihood is based on 
P{log(t7) < log(ci)|xi] = @[(1/c) log(c;) — (1/0 )x:ß], because 
log(t7 )|(xi,c;) ~ Normal(x,B,o*). The log-likelihood is 
N 
L£(B,07) = $ GB, o?) 
= Di {dilog(g[(1/o) log(c:) - (/e)xiB]) + (1 - di) log(1 - ¢[(1/0)log(c;) - (1/0 )x:B])}. 
i=1 
Even though xa = 1, 1/o is identified as the coefficient on log(c;), provided c; varies. Then, of 
course, we can identify B because B/o is generally identified from the probit log-likelihood. 
[We would need to assume the usual condition that rank E(x;x;) = K.] If c; = c for all i then 
the intercept effectively becomes (1/c) log(c) — 61/0; along with B;/o, j = 2,...,K, these are 
the only parameters we can identify. We cannot separately identify B or o. This is the same 


situation we faced in Section 19.2.1. 


22.5. a. We have 


P(t? < AXi di Cipsi = 1) = P(E < tx, t? > b-ai) 
= P(t} < t,t} > b- aix) P(t} > b -—ailxi) = P(b-a; < t} < tx) P(t: > b-a;|x;) 
= [F(dx,) — F(b - a||x;)/[1 — F(b - ailx:)] 


where we use the fact that t < b — a;i. 
b. The derivative of the cdf in part a, with respect to ¢, is simply f(¢|x;)/[1 — F(b — ai|x;)]. 
c. Because s; = 1[t7 > b-—a;] and t; = c; if and only if t* > c;, we have 


P(t; -_ CK), Ci, Si = 1) = P(t? > ci|Xi, t7 > b— aj) 
= P(t} > ci|xi)/P(t} >b- ai|Xi) 
= [1 - F(ci|xi)|/[1 - Fb — ailxi)] 
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where the third equality follows because c; > b — ai. 
d. Parts b and c show that the density of the censored variable, ¢;, conditional on 
(X;,@i,Ci,S; = 1), can be written as 


[1 — F(b - ailx;)] 


Showing the dependence on the parameters, plugging in ¢;, noting that d; = 1[t; < ci], and 


taking the log gives 


d;log|f{t|x;; 8) | F (1 — dj) log[1 — F(c;|x;;9)] log[1 F(b a;|x;;9)]. 
Summing across all N observations gives equation (22.30). 


22.6. In what follows, we initially suppress dependence on the parameters. 


a. Because s; = 1[t* > b — ai], 


P(a; 


lA 


a,s; = 1|x;) = P(a; < a,t7 = b—a;|x;) 


j i= q(u, o|x;)dodu, 


where q(s, +|x;) denotes the joint density of (a;, t¥) given x;. By conditional independence, 


q(a, t|xi) = k(a|x;)f(¢|x:), and so the double integral is 
f K K Rolx:)a )kCulx:)du = J O- F@- ux) Jklulxi)du 
0 b-u 0 


because | i flolx;)do = [1 — F(b — ulx;)]. 

b. From the hint, we first compute E(s; = 1a;,x;) = P(t} > b-aj|x;) = 1 — F(b-ajjx;). 
Next, we compute the expected value of this with respect to the distribution D(a;|x;), which is 
simply equation (22.32). 

c. The conditional cdf is obtained by dividing the answer from part a by the answer from 


part b. The density is just the derivative of the resulting expression with respect to a; by the 
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fundamental theorem of calculus, the derivative is (22.31). 

d. When b = 1 and &(a|x;) = 1, all 0 < a < 1, the numerator of (22.31) is just 
1 — F(1 — a|x;). The denominator is simply po — F(1 — uļx;)]du. 

e. In the Weibull case, 1 — F(1 — a|x;) = exp[—exp(x;f)(1 — a)*] and the denominator is 
f exp[—exp(x;B)(1 — u)*]du. This integral cannot be solved in closed form unless a = 1. 


22.7. a. For notational simplicity, the parameters in the densities are suppressed. Then, by 
equation (22.22) and D(a;\c;,x;) = D(a;|x;), the density of (a;, t*) given (c;,x;) does not 
depend on c; and is given by k(a|x;)f(¢|x;) for 0 < a < band 0 < t < œ. This is also the 
conditional density of (a;,t;) given (ci,x;) for t < c;, that is, for values of t corresponding to 
being uncensored. For t = c;, the density is k(a|x;)[1 — F(c;|x;)] by the usual right censoring 
argument. Now, the probability of observing the random draw (aj, Ci, xi, ti), conditional on x;, 
is P(t? > b — ai, Xi), which is exactly (22.32). From the standard result for densities for 
truncated distributions, the density of (a;,t;) given (c;,di,x;) and s; = 1 is 

k(alx:)[Relx:) “(1 — F(eix)]°/P(s; = 1x), 
for all combinations (a, t) such as that s; = 1. Putting in the observed data, inserting the 
parameters, and taking the log gives (22.56). 

b. We have the usual trade-off between robustness and efficiency. Using the log likelihood 
(20.56) results in more efficient estimators provided we have the two densities correctly 
specified; (20.30) requires us to only specify /(-|x;). 

22.8. a. Again I suppress the parameters in the densities. Let z; = t7 — b, so that, by 
assumption, a; and z; are independent conditional on x;. The conditional density of z; is simply 


g(z|x;) = f(z + b|x;), z > —b. By the usual convolution formula for the density of a sum of 
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independent random variables, 
b b 
A(rixi) = | kajxdge-ux)du = | kx + b- uxi)du, 
0 0 


where fir + b — ulx;) = O if r+b -u < 0. Whenr > 0,r+b-— u > 0 forall 0 < u < b, and so 
we need not modify the formula. 

b. As usual for right censoring, P(r; = g|x:) = P(r} > q|x:) = 1- H(q|x:). 

c. The argument is now essentially the same as the first stock sampling argument treated in 
Section 22.3.3, except that we cannot condition on a;. Instead, for 0 < r < q, 
P(r; < r|xi,s; = 1) = [H(r|x:) — H(0|x;)]/P(s; = 1|x;) — and so the density for 0 < r < q is 
h(r|x;)/P(s; = 1|x;) — and P(r; = q|xi,s; = 1) = [1 — H(q|x:)]/P(si = 1|x;), where P(s; = 1|x;) 
is given by (22.32). 


d. With b = 1 and a uniform distribution for a;, the log-likelihood reduces to 
1 
d;log[h(r;|x:;9,n)] + (1 — d;) logl1 — H(rilx:;0,n)] — log { f [1 - F(1-— ulsis@)]du }. 
0 
22.9. a. Let œ be the value for type B people. Then we must have 


pn+(1-p)o=1 


or @ = (1 - pn)/(1 - p). 
b. The cdf conditional on (x, v) is F(¢|x, v;a, B) = 1 — exp[—vexp(xB)¢*]. Therefore, the cdf 


conditional on x is obtained by averaging out v: 


G(¢|x; a, B, n, p) = p{1 — exp[—nexp(xB )¢7]} 
+ (1 — p){1 —exp[—((1 — pn)/(1 — p)) exp(xB)r7] +. 


c. The density function is just the derivative with respect to t: 
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g(t\x;a,B,7, p) = pnexp(xp)t expl- exp(xB )t*] 
+ (1 — pn) exp(xB)at*™ exp[—((1 — pn)/(1 — p)) exp(xB)¢7]} 


If none of the durations are censore, the log-likelihood for each observation i is obtained by 
taking the log of this density in plugging in (x;,t;). If we have right censored data with 


censoring values c;, the log likelihood takes on the usual form: 


diloglg(ti|xi; a,B, 7, p)] + A — di) loglG(ti|xi; a, p,n, p)] 
where d; is the dummy variable equal to one of the observation is not censored. The log 
likelihood for the entire sample is a very smooth function of all parameters. As a 
computational device, it might be better to replace ņ with, say, exp(¢)/[1 + exp(¢)] for 
—œ < € < œ and similarly for p. 
22.10. a. If P(T > am-1) = 0 then P(T > am) = 0 because am > am-1 in which case the 
equality is trivial. So assume that P(T > am-1) > 0. Then, by definition of conditional 


probability, 


P(T > Gall > am-1) = P(T > am, T > am-1)/P(T > Gp) 
= P(T > am)/P(T > aa) 


since the events {7 > am, T > am-1} and {T > amy are identical when am > am-1. 
Rearranging the equality gives the result. 

b. We can use induction to obtain an algebraically simple proof. First, equation (22.48) 
holds trivially when m = 1: P(T > aı) = P(T > aı|T > 0) because P(T > 0) = 1. Now assume 


that (22.48) holds for any m > 1. We show it holds for m + 1. By part a, 


P(T > damı) = P(T > amaı|T > am)PCT > am) 


= P(T > agull > am) [ [2c > a,|T > a1) 


r=1 
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because P(T > am) = It, P(T > a,|T > a;-1) by the induction hypothesis. It follows that 


P(T > amu) = | [P(T > aT > amı) 


r=1 


and this completes the proof. 


m+1 


22.11. a. The estimates from the log-logistic model with gamma-distributed hazard are 


given below. For comparison purposes, the Stata output used to produce Table 22.2 follows. 


The log likelihood for the log-logistic model is —1, 587.92 and that for the Weibull model 


(both with gamma heterogeneity) is —1, 584.92. The Weibull model fits somewhat better. (A 


Vuong model selection statistic could be computed to see if the fit is statistically better.) 


streg workprg priors tserved felon alcohol drugs black married educ age, 
d(loglogistic) fr(gamma) 


failure _d: failed 
analysis time _t: durat 
Loglogistic regression -- accelerated failure-time form 


No. of subjects 
No. of failures 


Time at risk 


Log likelihood 


Gamma frailty 


Number of obs 


LR chi2(10) 


workprg | 
priors | 
tserved | 
felon | 
alcohol | 
drugs | 
black | 
married | 
educ | 

| 

| 


0098501 
- .1488187 
- .0190707 

»4219072 
- .6597875 
- .2156168 
- .4355534 

. 4345344 

.0243336 

.0032531 

3.34599 


2428505 
- 1034275 
- .0119861 

. 7163677 
- .3706203 

.0412227 
- .1985921 

. 6997988 

.0764321 

. 0044937 
4.045864 


/1n_gam 


- . 3648587 
.8176437 


.0716481 
.1998151 


- .2244309 
1.209274 


. 6942948 


.0497449 
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Prob > chi2 
P>|z| [95% Conf. 
0.934 - ,2231504 
0.000 -.19421 
0.000 -.0261553 
0.005 .1274467 
0.000 - .9489547 
0.100 - .4724563 
0.000 - ,.6725147 
0.001 .16927 
0.360 - .0277648 
0.000 .0020124 
0.000 2.646117 
0.000 - .5052864 
0.000 . 4260133 

. 6033327 


. 7989708 


theta | 2.265156 . 4526125 1.531141 3.351051 


Likelihood-ratio test of theta=0: chibar2(01) = 46.05 Prob>=chibar2 = 0.000 


streg workprg priors tserved felon alcohol drugs black married educ age, 
d(weibull) fr(gamma) nohr 


failure _d: nocens 
analysis time _t: durat 


Weibull regression -- log relative-hazard form 
Gamma frailty 


No. of subjects = 1445 Number of obS = 1445 
No. of failures = 552 
Time at risk = 80013 
LR chi2(10) = 143.82 
Log likelihood = -1584.9172 Prob > chi2 = 0.0000 
E a | Coef Std. Err Z P>|z | [95% Conf. Interval 
vil ia A = a a +--------------------------------------------------------------- 
workprg | .0073827 . 2038775 0.04 0.971 - .3922099 . 4069753 
priors | . 2431142 .0421543 5.77 0.000 . 1604933 .3257352 
tserved | . 0349363 .0070177 4.98 0.000 .0211818 .0486908 
felon | -.7909533 . 2666084 -2.97 0.003 -1.313496 - .2684104 
alcohol | 1.173558 . 2805222 4.18 0.000 .6237451 1.723372 
drugs | .2847665 . 2233072 1.28 0.202 -.1529074 . 7224405 
black | . 7715762 . 2038289 3.79 0.000 .372079 1.171073 
married | -.8057042 .2578214 -3.13 0.002 -1.311025 - . 3003834 
educ | -.0271193 .044901 -0.60 0.546 - .1151237 . 060885 
age | -.0052162 . 0009974 -5.23 0.000 -.0071711 - .0032613 
cons | -5.393658 . 720245 -7.49 0.000 -6.805312 -3.982004 
/ln_p | . 5352553 .0951206 5.63 0.000 . 3488225 . 7216882 
/ln_the | 1.790243 . 1788498 10.01 0.000 1.439703 2.140782 
1.707884 1624549 1.417398 2.057904 
1/p | 5855198 055695 4859312 7055184 
theta | 5.990906 1.071472 4.219445 8.506084 
Likelihood-ratio test of theta=0: chibar2(01) = 96.23 Prob>=chibar2 = 0.000 


b. The conditional hazard is plotted below. Its shape is very different from the Weibull 
case, which is always upward sloping. (See Figure 22.3.) The hazard for the log-logistic model 
with gamma heterogeneity has its maximum value near 30 weeks, and it falls off gradually 


after that. 
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c. The unconditional hazard is plotted below. While it also has a hump shape, the 


maximum value of the unconditional hazard is around 12 — which is well below that for the 


conditional hazard. 


. stcurve, haz 
(option unconditional assumed) 


Unconditional Hazard (Log-logistic plus Gamma) 
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There is a final point worth making about this example that builds on the discussion in 
Section 22.3.4. Below is the graph in Figure 22.4, reproduced for convenience. It is the 
unconditional hazard for the Weibull model with gamma heterogeneity. While it differs 
somewhat from the unconditional hazard for the log-logistic model with gamma heterogeneity, 
it is practically very similar. Both hazards have sharp increases until about 12 months, and then 
fall off to zero more gradually. In other words, when we study features of the distribution 
D(t}|x;) — which is what we can generally hope to identify — we get pretty similar findings. Yet 
the hazards based on D(¢7|x;, v;) are very different. Recall that the hazard for D(t;|x;,v;) in the 
Weibull case is of the proportional hazard form while that for the log-logistic is not (which is 
apparent when studying the plots of the conditional hazards). Given that the models fit the data 
roughly equally well, and that they give similar shapes for the hazard of D(¢7|x;), it seems that 
trying to decide whether the conditional hazard has a hump shape, as in part b, or is strictly 


increasing, as in Figure 22.3, seems pretty hopeless. 
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Incidentally, if we use a gamma distribution for D(¢;|x;,v;) with gamma heterogeneity, we 
obtain an unconditional hazard very similar to the Weibull and log-logistic cases. The 


conditional hazard is similar to the log-logistic case, and the log likelihood value is —1, 585.09. 
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